YoVDO

Large-Scale Data Shuffle in Ray with Exoshuffle

Offered By: Anyscale via YouTube

Tags

Machine Learning Courses Data Sorting Courses Distributed Computing Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the innovative Exoshuffle system for large-scale data processing in this 26-minute conference talk from Anyscale. Delve into the world of shuffle, a crucial primitive in data processing applications, and discover how Exoshuffle challenges conventional wisdom by implementing high-performance, reliable shuffle on Ray, a general-purpose distributed computing system. Learn how Exoshuffle outperforms Spark and achieves an impressive 82% of theoretical performance on a 100TB sort using 100 nodes. Gain insights into the integration of Exoshuffle with Ray 2.0's Datasets library, providing enhanced large-scale shuffle capabilities for machine learning users. This talk offers valuable knowledge for data scientists, engineers, and anyone interested in advancing large-scale data processing techniques.

Syllabus

Large-scale data shuffle in Ray with Exoshuffle


Taught by

Anyscale

Related Courses

Introduction to Artificial Intelligence
Stanford University via Udacity
Natural Language Processing
Columbia University via Coursera
Probabilistic Graphical Models 1: Representation
Stanford University via Coursera
Computer Vision: The Fundamentals
University of California, Berkeley via Coursera
Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent