YoVDO

Scaling Distributed XGBoost and Parallel Data Ingestion with Ray - FlightAware Case Study

Offered By: Anyscale via YouTube

Tags

Machine Learning Courses Amazon Web Services (AWS) Courses Predictive Modeling Courses Distributed Computing Courses Data Ingestion Courses XGBoost Courses Parquet Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore how FlightAware leverages Ray and AWS to scale distributed XGBoost training and parallel data ingestion for their runway prediction model. Dive into the process of building a cost-effective, scalable solution that efficiently processes terabytes of training data from S3 into distributed memory. Learn about organizing training data, configuring fault-tolerant and elastic Ray clusters, utilizing Amazon Lustre for FSx filesystem, and tracking metrics with MLFlow. Gain insights into optimizing costs and training time through practical tips and tricks discovered during the implementation. This 30-minute talk by Anyscale showcases the power of Ray in handling vast amounts of global aircraft data and demonstrates how to build an efficient distributed XGBoost training system for large-scale machine learning applications.

Syllabus

FlightAware and Ray: Scaling Distributed XGBoost and Parallel Data Ingestion


Taught by

Anyscale

Related Courses

Python for Data Science Tips, Tricks, & Techniques
LinkedIn Learning
Sound Data Engineering in Rust - From Bits to DataFrames
Databricks via YouTube
Recent Parquet Improvements in Apache Spark - Vectorized Complex Types and Column Index Support
Databricks via YouTube
Optimizing Spark SQL Jobs with Parallel and Asynchronous IO
Databricks via YouTube
Degrading Performance - Understanding and Solving Small Files Syndrome
Databricks via YouTube