YoVDO

Scaling MLOps to Retrain 50k Weekly Models in Parallel Using UDFs

Offered By: Databricks via YouTube

Tags

MLOps Courses Machine Learning Courses Databricks Courses Data Engineering Courses Scalability Courses Parallel Processing Courses MLFlow Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Discover how data.ai's machine learning team leverages the Databricks Platform to implement MLOps best practices for high-frequency retraining in this 32-minute conference talk. Learn about the framework created to incorporate MLOps into weekly retraining for approximately 50,000 sklearn models in parallel. Explore how Pandas UDFs can be used to apply arbitrary code in groups, enabling MLflow logging and model registration at scale for any grouped data. Gain insights into the challenges of parallelizing model training across multiple categories and countries, and understand the limitations of this approach. Consider how this methodology could be adapted for more time-sensitive use cases. Presented by Kaleb Lowe, Staff Machine Learning Engineer at Data.AI, this talk offers valuable insights for data scientists and machine learning engineers working on large-scale model retraining projects.

Syllabus

Scaling MLOps to Retrain 50k Weekly Models in Parallel Using UDFs.


Taught by

Databricks

Related Courses

Data Processing with Azure
LearnQuest via Coursera
Mejores prácticas para el procesamiento de datos en Big Data
Coursera Project Network via Coursera
Data Science with Databricks for Data Analysts
Databricks via Coursera
Azure Data Engineer con Databricks y Azure Data Factory
Coursera Project Network via Coursera
Curso Completo de Spark con Databricks (Big Data)
Coursera Project Network via Coursera