YoVDO

Systems Engineering in Machine Learning - Navigating Low-Level Challenges

Offered By: MLOps.community via YouTube

Tags

MLOps Courses Machine Learning Courses PyTorch Courses Systems Engineering Courses Distributed Training Courses Flyte Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Dive into a 48-minute podcast episode featuring Andrew Dye, a systems engineer navigating the world of machine learning. Explore the intersection of low-level engineering and MLOps as Andrew shares insights from his experience as a tech lead for ML Infrastructure at Meta and his current role as a software engineer at Union. Learn about distributed training, managing large-scale ML systems, and bridging the gap between firmware and MLOps. Discover how different engineering disciplines can work together effectively in MLOps teams, and gain valuable perspectives on the future of ML infrastructure and tooling. Hear discussions on topics such as execution patterns, rapid change adoption, consensus challenges, and the importance of abstractions in ML engineering.

Syllabus

[] Andrew's preferred coffee
[] Introduction to Andrew Dye
[] Takeaways
[] Huge shoutout to our sponsors UnionML and UnionAI!
[] Andrew's background
[] Andrew's learning curve
[] Bridging the gap between firmware space and MLOps
[] In connection with Pytorch team
[] Things that should have learned sooner
[] Type of scale Andrew works on
[] Distributed training at Meta
[] Managing the huge search space
[] Execution patterns programs
[] Non-ML engineers dealing with ML engineers having the same skill set
[] Pace rapid change adoptation
[] Consensus challenges
[] Abstractions making sense now
[] Comparing to others
[] General principles in UnionAI tooling
[] Seeing the future
[] Inter-task checkpointing
[] Combining functionality with use cases
[] Wrap up


Taught by

MLOps.community

Related Courses

Building Robust ML Production Systems Using OSS Tools for Continuous Delivery for ML
Linux Foundation via YouTube
Efficient Data Parallel Distributed Training with Flyte, Spark and Horovod
Linux Foundation via YouTube
Embracing Multi-Tenancy While Scaling MLOps
CNCF [Cloud Native Computing Foundation] via YouTube
Embracing Multi-Tenancy While Scaling MLOps
CNCF [Cloud Native Computing Foundation] via YouTube
Enforcing Data Quality in Data Processing and ML Pipelines with Flyte and Pandera
Linux Foundation via YouTube