YoVDO

Dorylus - Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads

Offered By: USENIX via YouTube

Tags

OSDI (Operating Systems Design and Implementation) Courses Deep Learning Courses Scalability Courses Distributed Computing Courses Serverless Computing Courses

Course Description

Overview

Explore a cutting-edge distributed system for training Graph Neural Networks (GNNs) in this 15-minute conference talk from OSDI '21. Learn about Dorylus, an innovative approach that leverages serverless computing to overcome the challenges of expensive GPU servers and limited memory when working with billion-edge graphs. Discover how computation separation enables a deep, bounded-asynchronous pipeline that effectively hides network latency. Understand why CPU servers offer the best performance-per-dollar for large graphs and how integrating Lambda threads can significantly boost efficiency. Gain insights into Dorylus' architecture, its ability to scale GNN training, and its impressive performance compared to existing systems. Delve into the challenges of using serverless computing and the solutions implemented to address limited resources and network constraints.

Syllabus

Intro
Machine Learning
Graph Neural Networks
Stages of a Graph Neural Network
GPUs Are Not a Good Fit for Graph Operations
Combining CPUs and GPUs is Cost-Ineffective
Using Many CPU Servers Can Still Be Expensive
Key Insight: Serverless Fits Our Goals
Serverless Achieves Low-Cost, Scalable Efficiency
Challenges with Using Serverless
Challenge 1: Limited Resources
Solution: Computation Separation
Dorylus Architecture
Flow of Decomposed Tasks
Challenge 2: Limited Network
Solution: Create Pipeline of Decomposed Tasks
Data Chunks Moving Through Layer of Pipeline
Synchronize after Scatter Hinders Pipeline
Two Sync Points Makes Asynchrony Difficult
Minimizing Effects of Asynchrony on Convergence
Serverless Optimizations
Data Graphs
We Evaluated Several Aspects of Dorylus
High Value on Large-Sparse Graphs
Dorylus Outperforms Existing Systems
Dorylus Scales Full Graph Training
Conclusion: Dorylus Provides Value


Taught by

USENIX

Related Courses

GraphX - Graph Processing in a Distributed Dataflow Framework
USENIX via YouTube
Theseus - An Experiment in Operating System Structure and State Management
USENIX via YouTube
RedLeaf - Isolation and Communication in a Safe Operating System
USENIX via YouTube
Microsecond Consensus for Microsecond Applications
USENIX via YouTube
KungFu - Making Training in Distributed Machine Learning Adaptive
USENIX via YouTube