Scaling MLOps to Retrain 50k Weekly Models in Parallel Using UDFs
Offered By: Databricks via YouTube
Course Description
Overview
Discover how data.ai's machine learning team leverages the Databricks Platform to implement MLOps best practices for high-frequency retraining in this 32-minute conference talk. Learn about the framework created to incorporate MLOps into weekly retraining for approximately 50,000 sklearn models in parallel. Explore how Pandas UDFs can be used to apply arbitrary code in groups, enabling MLflow logging and model registration at scale for any grouped data. Gain insights into the challenges of parallelizing model training across multiple categories and countries, and understand the limitations of this approach. Consider how this methodology could be adapted for more time-sensitive use cases. Presented by Kaleb Lowe, Staff Machine Learning Engineer at Data.AI, this talk offers valuable insights for data scientists and machine learning engineers working on large-scale model retraining projects.
Syllabus
Scaling MLOps to Retrain 50k Weekly Models in Parallel Using UDFs.
Taught by
Databricks
Related Courses
Financial Sustainability: The Numbers side of Social Enterprise+Acumen via NovoEd Cloud Computing Concepts: Part 2
University of Illinois at Urbana-Champaign via Coursera Developing Repeatable ModelsĀ® to Scale Your Impact
+Acumen via Independent Managing Microsoft Windows Server Active Directory Domain Services
Microsoft via edX Introduction aux conteneurs
Microsoft Virtual Academy via OpenClassrooms