Deep Learning Meets Nonparametric Regression: Are Weight-decayed DNNs Locally Adaptive?
Offered By: USC Information Sciences Institute via YouTube
Course Description
Overview
Syllabus
Intro
From the statistical point of view, the success of DNN is a mystery.
Why do Neural Networks work better?
The "adaptivity" conjecture
NTKs are strictly suboptimal for locally adaptive nonparametric regression
Are DNNs locally adaptive? Can they achieve optimal rates for TV-classes/Besov classes?
Background: Splines are piecewise polynomials
Background: Truncated power basis for splines
Weight decay = Total Variation Regularization
Weight decayed L-Layer PNN is equivalent to Sparse Linear Regression with learned basis functions
Main theorem: Parallel ReLU DNN approaches the minimax rates as it gets deeper.
Comparing to classical nonparametric regression methods
Examples of Functions with Heterogeneous Smoothness
Step 2: Approximation Error Bound
Summary of take-home messages
Taught by
USC Information Sciences Institute
Related Courses
Neural Networks for Machine LearningUniversity of Toronto via Coursera 機器學習技法 (Machine Learning Techniques)
National Taiwan University via Coursera Machine Learning Capstone: An Intelligent Application with Deep Learning
University of Washington via Coursera Прикладные задачи анализа данных
Moscow Institute of Physics and Technology via Coursera Leading Ambitious Teaching and Learning
Microsoft via edX