Scaling MLOps to Retrain 50k Weekly Models in Parallel Using UDFs
Offered By: Databricks via YouTube
Course Description
Overview
Discover how data.ai's machine learning team leverages the Databricks Platform to implement MLOps best practices for high-frequency retraining in this 32-minute conference talk. Learn about the framework created to incorporate MLOps into weekly retraining for approximately 50,000 sklearn models in parallel. Explore how Pandas UDFs can be used to apply arbitrary code in groups, enabling MLflow logging and model registration at scale for any grouped data. Gain insights into the challenges of parallelizing model training across multiple categories and countries, and understand the limitations of this approach. Consider how this methodology could be adapted for more time-sensitive use cases. Presented by Kaleb Lowe, Staff Machine Learning Engineer at Data.AI, this talk offers valuable insights for data scientists and machine learning engineers working on large-scale model retraining projects.
Syllabus
Scaling MLOps to Retrain 50k Weekly Models in Parallel Using UDFs.
Taught by
Databricks
Related Courses
内存数据库管理openHPI CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX Processing Big Data with Azure Data Lake Analytics
Microsoft via edX Google Cloud Big Data and Machine Learning Fundamentals en Español
Google Cloud via Coursera Google Cloud Big Data and Machine Learning Fundamentals 日本語版
Google Cloud via Coursera