Large-Scale Data Shuffle in Ray with Exoshuffle
Offered By: Anyscale via YouTube
Course Description
Overview
Explore the innovative Exoshuffle system for large-scale data processing in this 26-minute conference talk from Anyscale. Delve into the world of shuffle, a crucial primitive in data processing applications, and discover how Exoshuffle challenges conventional wisdom by implementing high-performance, reliable shuffle on Ray, a general-purpose distributed computing system. Learn how Exoshuffle outperforms Spark and achieves an impressive 82% of theoretical performance on a 100TB sort using 100 nodes. Gain insights into the integration of Exoshuffle with Ray 2.0's Datasets library, providing enhanced large-scale shuffle capabilities for machine learning users. This talk offers valuable knowledge for data scientists, engineers, and anyone interested in advancing large-scale data processing techniques.
Syllabus
Large-scale data shuffle in Ray with Exoshuffle
Taught by
Anyscale
Related Courses
Introduction to Artificial IntelligenceStanford University via Udacity Natural Language Processing
Columbia University via Coursera Probabilistic Graphical Models 1: Representation
Stanford University via Coursera Computer Vision: The Fundamentals
University of California, Berkeley via Coursera Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent