YoVDO

Swing - Short-cutting Rings for Higher Bandwidth Allreduce

Offered By: USENIX via YouTube

Tags

Distributed Systems Courses Machine Learning Courses Network Topologies Courses High Performance Computing Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a 19-minute conference talk from NSDI '24 that introduces Swing, a novel algorithm designed to enhance allreduce performance on torus networks. Learn how this innovative approach reduces the number of hops between communicating nodes by swinging between torus directions, resulting in up to 3x performance improvement over existing allreduce algorithms. Discover the algorithm's effectiveness across various vector sizes and torus-like topologies, regardless of shape and size. Gain insights into the significance of allreduce operations in distributed systems and their impact on workload runtime, particularly in machine learning-optimized systems like Google TPUs and Amazon Trainium devices, as well as Top500 supercomputers. Understand the challenges posed by torus networks and how Swing addresses them to achieve higher bandwidth allreduce operations.

Syllabus

NSDI '24 - Swing: Short-cutting Rings for Higher Bandwidth Allreduce


Taught by

USENIX

Related Courses

Introduction to Artificial Intelligence
Stanford University via Udacity
Natural Language Processing
Columbia University via Coursera
Probabilistic Graphical Models 1: Representation
Stanford University via Coursera
Computer Vision: The Fundamentals
University of California, Berkeley via Coursera
Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent