Lab Home | Phone | Search
Center for Nonlinear Studies  Center for Nonlinear Studies
 Colloquia Archive 
 Postdoc Seminars Archive 
 Quantum Lunch 
 Quantum Lunch Archive 
 CMS Colloquia 
 Q-Mat Seminars 
 Q-Mat Seminars Archive 
 P/T Colloquia 
 Kac Lectures 
 Kac Fellows 
 Dist. Quant. Lecture 
 Ulam Scholar 
 CNLS Fellowship Application 
 Student Program 
 Past Visitors 
 History of CNLS 
 Maps, Directions 
 CNLS Office 
Monday, November 06, 2023
11:00 AM - 12:00 PM
CNLS Conference Room (TA-3, Bldg 1690)


Stochastic gradient descent (SGD): a unified algorithmic overview

Paul Rodriguez
Pontificia Universidad Catolica del Peru

Gradient descent (GD) is a well-known first order optimization method, which uses the gradient of the loss function, along with a step-size (or learning rate), to iteratively update the solution. When the loss (cost) function is dependent on datasets with large cardinality, such in cases typically associated with deep learning (DL), GD becomes impractical. In this scenario, stochastic GD (SGD), which uses a noisy gradient approximation (computed over a random fraction of the dataset), has become crucial. There exits several variants/improvements over the "vanilla" SGD, such SGD+momentum, Adagrad, RMSprop, Adadelta, Adam, Nadam, AdaBelief, etc., which are usually given as black-boxes by most of DL's libraries (TensorFlow, PyTorch, etc.). The primary objective of this talk is to open such black-boxes by explaining their "evolutionary path", in which each SGD variant may be understood as a set of add-on features over the vanilla SGD. Furthermore, since the hyper-parameters associated with each SGD variant do directly influence their performance, they will also be assessed from a theoretical and computational point of view.

Host: Brendt Wohlberg, T5