Scaling XGBoost for Thousands of Features with Databricks
Offered By: Databricks via YouTube
Course Description
Overview
Explore scaling techniques for XGBoost models with thousands of features in this 51-minute conference talk from Databricks. Dive into an online advertising use case that enables marketers to target users based on demographic information. Learn about the challenges faced, mistakes made, and valuable insights gained during the process of scaling XGBoost model training. Discover common pitfalls to avoid and notable differences between Python and Scala implementations of XGBoost in Spark. Gain practical knowledge from experts Phan Chuong and Eric Yatskowitz as they share their experiences in scaling machine learning models for production environments and supporting marketing decisions with data insights.
Syllabus
Intro
Welcome
Recording
Boulder Denver Group
Databricks Summit 2022
Fan and Eric
Introduction
Agenda
TMobile Marketing Solutions
Magenta Marketing Platform
Why dont we just use this data directly
How are demographic insights used
Pandas
UDF
Improving XGBoost
Data set
Why XGBoost
What we did
How did we achieve that
Parallelizations
Autoscaling
Normal transformation
Pivot vs Vector
RDD
Questions
Taught by
Databricks
Related Courses
Data Science at Scale - Capstone ProjectUniversity of Washington via Coursera Feature Engineering for Improving Learning Environments
University of Texas Arlington via edX How to Win a Data Science Competition: Learn from Top Kagglers
Higher School of Economics via Coursera Advanced Machine Learning
The Open University via FutureLearn Feature Engineering
Google Cloud via Coursera