The Long-Run Distribution of Stochastic Gradient Descent: A Large Deviations Approach
Offered By: Erwin Schrödinger International Institute for Mathematics and Physics (ESI) via YouTube
Course Description
Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the long-run behavior of stochastic gradient descent (SGD) in non-convex optimization problems through this 25-minute conference talk delivered at the Workshop on "One World Optimization Seminar in Vienna" at the Erwin Schrödinger International Institute for Mathematics and Physics (ESI). Delve into the analysis of SGD's long-run state distribution using large deviation theory and randomly perturbed dynamical systems. Discover how the distribution resembles the Boltzmann-Gibbs distribution from equilibrium thermodynamics, with step-size as temperature and energy levels determined by the objective and noise statistics. Learn about key findings, including the exponentially higher visitation frequency of critical regions, concentration of iterates around minimum energy states, and the relationship between visitation frequency and energy levels for critical point components. Gain insights into the dominance of minimizing components over non-minimizing ones in terms of visitation frequency.
Syllabus
Panayotis Mertikopoulos - The Long-Run Distribution of Stochastic Gradient Descent: A Large...
Taught by
Erwin Schrödinger International Institute for Mathematics and Physics (ESI)
Related Courses
Calculus II: Multivariable FunctionsDelft University of Technology via edX Mathematics 1 Part 1: Differential Calculus
London School of Economics and Political Science via edX Mild Dissipative Diffeomorphisms of the Disk with Zero Entropy - Lecture 2
Instituto de Matemática Pura e Aplicada via YouTube Mild Dissipative Diffeomorphisms of the Disk with Zero Entropy - Lecture 3
Instituto de Matemática Pura e Aplicada via YouTube A Family of Rational Maps with One Free Critical Point
Banach Center via YouTube