YoVDO

HammingMesh: A Network Topology for Large-Scale Deep Learning

Offered By: Scalable Parallel Computing Lab, SPCL @ ETH Zurich via YouTube

Tags

Deep Learning Courses Network Topologies Courses Parallel Computing Courses Scalability Courses Supercomputing Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a groundbreaking network topology for large-scale deep learning systems in this award-winning conference talk from ACM/IEEE Supercomputing 2022. Delve into the HammingMesh design, a novel approach developed by the Scalable Parallel Computing Lab at ETH Zurich to address data movement challenges in AI training. Discover how HammingMesh provides high bandwidth at low cost with enhanced job scheduling flexibility, supporting full bandwidth and isolation for deep learning training jobs with two-dimensional parallelism. Learn about its capacity to handle high global bandwidth for generic traffic, positioning it as a crucial component for future AI systems with extreme bandwidth requirements. Gain insights into the workload analysis that informed the topology's design and understand how HammingMesh aims to overcome the performance limitations of current training systems, potentially unlocking the next phase of growth in modern AI.

Syllabus

HammingMesh: A Network Topology for Large-Scale Deep Learning


Taught by

Scalable Parallel Computing Lab, SPCL @ ETH Zurich

Related Courses

An Introduction to Computer Networks
Stanford University via Independent
Computer Networks and the Internet
Kiron via edX
IT Support: Networking Essentials
Microsoft via edX
Digital Switching - I
Indian Institute of Technology Kanpur via Swayam
How To Build a Network Topology Using GNS3
Coursera Project Network via Coursera