KungFu - Making Training in Distributed Machine Learning Adaptive
Offered By: USENIX via YouTube
Course Description
Overview
Syllabus
Intro
Training in Distributed ML Systems
Parameters in Distributed ML Systems
Issues with Empirical Parameter Tuning
Proposals for Automatic Parameter Adaptation
Open Challenges
Existing Approaches for Adaptation
KungFu Overview
Adaptation Policies
Example: Adaptation Policy for GNS
Embedding Monitoring Inside Dataflow Problem: High monitoring cost reduces adaptation benefit Idea: Improve efficiency by adding monitoring operators to dataflow graph
Challenges of Dataflow Collective Communication
Making Collective Communication Asynchronous Idea: Use asynchronous collective communication
Issues When Adapting System Parameters
Distributed Mechanism for Parameter Adaptation
How Effectively Does KungFu Adapt?
Conclusions: Kung Fu
Taught by
USENIX
Related Courses
Scalable Data ScienceIndian Institute of Technology, Kharagpur via Swayam Data Science and Engineering with Spark
Berkeley University of California via edX Data Science on Google Cloud: Machine Learning
Google via Qwiklabs Modern Distributed Systems
Delft University of Technology via edX Oort - Efficient Federated Learning via Guided Participant Selection
USENIX via YouTube