Lab Home | Phone | Search
Center for Nonlinear Studies  Center for Nonlinear Studies
 Home 
 People 
 Current 
 Affiliates 
 Visitors 
 Students 
 Research 
 ICAM-LANL 
 Publications 
 Conferences 
 Workshops 
 Sponsorship 
 Talks 
 Colloquia 
 Colloquia Archive 
 Seminars 
 Postdoc Seminars Archive 
 Quantum Lunch 
 Quantum Lunch Archive 
 CMS Colloquia 
 Q-Mat Seminars 
 Q-Mat Seminars Archive 
 P/T Colloquia 
 Archive 
 Kac Lectures 
 Kac Fellows 
 Dist. Quant. Lecture 
 Ulam Scholar 
 Colloquia 
 
 Jobs 
 Postdocs 
 CNLS Fellowship Application 
 Students 
 Student Program 
 Visitors 
 Description 
 Past Visitors 
 Services 
 General 
 
 History of CNLS 
 
 Maps, Directions 
 CNLS Office 
 T-Division 
 LANL 
 
Monday, November 06, 2023
11:00 AM - 12:00 PM
CNLS Conference Room (TA-3, Bldg 1690)

Seminar

Stochastic gradient descent (SGD): a unified algorithmic overview

Paul Rodriguez
Pontificia Universidad Catolica del Peru

Gradient descent (GD) is a well-known first order optimization method, which uses the gradient of the loss function, along with a step-size (or learning rate), to iteratively update the solution. When the loss (cost) function is dependent on datasets with large cardinality, such in cases typically associated with deep learning (DL), GD becomes impractical. In this scenario, stochastic GD (SGD), which uses a noisy gradient approximation (computed over a random fraction of the dataset), has become crucial. There exits several variants/improvements over the "vanilla" SGD, such SGD+momentum, Adagrad, RMSprop, Adadelta, Adam, Nadam, AdaBelief, etc., which are usually given as black-boxes by most of DL's libraries (TensorFlow, PyTorch, etc.). The primary objective of this talk is to open such black-boxes by explaining their "evolutionary path", in which each SGD variant may be understood as a set of add-on features over the vanilla SGD. Furthermore, since the hyper-parameters associated with each SGD variant do directly influence their performance, they will also be assessed from a theoretical and computational point of view.

Host: Brendt Wohlberg, T5