Large-Scale Data Shuffle in Ray with Exoshuffle
Offered By: Anyscale via YouTube
Course Description
Overview
Explore the innovative Exoshuffle system for large-scale data processing in this 26-minute conference talk from Anyscale. Delve into the world of shuffle, a crucial primitive in data processing applications, and discover how Exoshuffle challenges conventional wisdom by implementing high-performance, reliable shuffle on Ray, a general-purpose distributed computing system. Learn how Exoshuffle outperforms Spark and achieves an impressive 82% of theoretical performance on a 100TB sort using 100 nodes. Gain insights into the integration of Exoshuffle with Ray 2.0's Datasets library, providing enhanced large-scale shuffle capabilities for machine learning users. This talk offers valuable knowledge for data scientists, engineers, and anyone interested in advancing large-scale data processing techniques.
Syllabus
Large-scale data shuffle in Ray with Exoshuffle
Taught by
Anyscale
Related Courses
Introduction to DatabasesMeta via Coursera Analyzing Big Data with SQL
Cloudera via Coursera Query Client Data with LibreOffice Base
Coursera Project Network via Coursera Unix Command Course for Beginners
Udemy Excel For Beginners! Top 30 Hottest Tutorials,Tips & Tricks!
Udemy