YoVDO

Conspirator - SmartNIC-Aided Control Plane for Distributed ML Workloads

Offered By: USENIX via YouTube

Tags

Distributed Machine Learning Courses Cost Optimization Courses RDMA Courses SmartNICs Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a groundbreaking conference talk from USENIX ATC '24 that introduces Conspirator, an innovative control plane design for distributed machine learning workloads. Delve into how this novel approach leverages SmartNICs to address CPU bottlenecks and suboptimal accelerator scheduling simultaneously. Learn about Conspirator's ability to facilitate efficient data transfer without host CPU involvement and its integration of a new scheduling algorithm that adapts to heterogeneous accelerators and changing workload dynamics. Discover the significant improvements Conspirator offers, including a 15% reduction in end-to-end completion time compared to RDMA-based alternatives, 17% better cost-effectiveness, 44% improved power efficiency, and a 33% reduction in GPU hours through optimized scheduling decisions. Gain insights into the evolving role of SmartNICs and their potential to revolutionize distributed ML workload management in this 18-minute presentation by researchers from Northwestern University and Hewlett Packard Labs.

Syllabus

USENIX ATC '24 - Conspirator: SmartNIC-Aided Control Plane for Distributed ML Workloads


Taught by

USENIX

Related Courses

FlexTOE - Flexible TCP Offload with Fine-Grained Parallelism
USENIX via YouTube
Using SmartNICs to Provide Better Data Center Security
44CON Information Security Conference via YouTube
Rearchitecting the TCP Stack for I-O-Offloaded Content Delivery
USENIX via YouTube
QEMU Storage Daemon and libblkio: Exploring New Frontiers for the QEMU Block Layer
Linux Foundation via YouTube
Using Kubernetes with Data Processing Units to Offload Infrastructure
CNCF [Cloud Native Computing Foundation] via YouTube