YoVDO

Bagua - Lightweight Distributed Learning on Kubernetes

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Conference Talks Courses Kubernetes Courses Horizontal Scaling Courses Distributed Deep Learning Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a conference talk on Bagua, a lightweight distributed learning framework for Kubernetes developed by Kuaishou Technology and ETH Zürich. Discover how Bagua supports high-performance distributed deep learning without requiring special network devices or restrictive scheduling. Learn about its innovative communication algorithms and seamless integration with Kubernetes, enabling horizontal scaling of training with excellent speedup guarantees using ordinary ethernet connections. Examine Bagua's effectiveness across various scenarios and models, including ResNet on ImageNet, Bert Large, and large-scale industrial applications at Kuaishou. Gain insights into its performance advantages, outperforming PyTorch-DDP, Horovod, and BytePS in end-to-end training time by up to 1.95 times in production Kubernetes clusters. Understand how Bagua addresses challenges in recommendation model training with massive parameters, video/image understanding with billions of samples, and ASR with terabyte-level datasets.

Syllabus

Bagua: Lightweight Distributed Learning on Kubernetes - Xiangru Lian & Xianghong Li, Kuaishou


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

Building Geospatial Apps on Postgres, PostGIS, & Citus at Large Scale
Microsoft via YouTube
Unlocking the Power of ML for Your JavaScript Applications with TensorFlow.js
TensorFlow via YouTube
Managing the Reactive World with RxJava - Jake Wharton
ChariotSolutions via YouTube
What's New in Grails 2.0
ChariotSolutions via YouTube
Performance Analysis of Apache Spark and Presto in Cloud Environments
Databricks via YouTube