Conspirator - SmartNIC-Aided Control Plane for Distributed ML Workloads
Offered By: USENIX via YouTube
Course Description
Overview
Explore a groundbreaking conference talk from USENIX ATC '24 that introduces Conspirator, an innovative control plane design for distributed machine learning workloads. Delve into how this novel approach leverages SmartNICs to address CPU bottlenecks and suboptimal accelerator scheduling simultaneously. Learn about Conspirator's ability to facilitate efficient data transfer without host CPU involvement and its integration of a new scheduling algorithm that adapts to heterogeneous accelerators and changing workload dynamics. Discover the significant improvements Conspirator offers, including a 15% reduction in end-to-end completion time compared to RDMA-based alternatives, 17% better cost-effectiveness, 44% improved power efficiency, and a 33% reduction in GPU hours through optimized scheduling decisions. Gain insights into the evolving role of SmartNICs and their potential to revolutionize distributed ML workload management in this 18-minute presentation by researchers from Northwestern University and Hewlett Packard Labs.
Syllabus
USENIX ATC '24 - Conspirator: SmartNIC-Aided Control Plane for Distributed ML Workloads
Taught by
USENIX
Related Courses
Windows Server 2019: Advanced Networking FeaturesLinkedIn Learning Deep Dive into GPU Support in Apache Spark 3.x - Accelerator-Aware Scheduling and RAPIDS Plugin
Databricks via YouTube Microsecond Consensus for Microsecond Applications
USENIX via YouTube An Edge-Queued Datagram Service for All Datacenter Traffic
USENIX via YouTube Building a High Performance Network in the Public Cloud Using RDMA - First Principles
Oracle via YouTube