Machine Learning Infrastructure at Facebook Scale
Offered By: MLOps World: Machine Learning in Production via YouTube
Course Description
Overview
Explore the challenges and solutions in scaling machine learning infrastructure at Facebook in this 18-minute conference talk from MLOps World: Machine Learning in Production. Gain insights into how Facebook's AI Infrastructure team reimagined their entire stack to support rapidly growing ranking models serving over a billion users. Discover the approach taken to redesign and scale the infrastructure, including the creation of specialized hardware using powerful GPUs and network devices, and the development of optimized distributed training algorithms using PyTorch. Learn from Senior AI Infra Engineer Shivam Bharuka as he shares his experience in driving performance, reliability, and efficiency-oriented designs across Facebook's AI Infrastructure components.
Syllabus
Machine Learning Infrastructure at Facebook Scale
Taught by
MLOps World: Machine Learning in Production
Related Courses
Custom and Distributed Training with TensorFlowDeepLearning.AI via Coursera Architecting Production-ready ML Models Using Google Cloud ML Engine
Pluralsight Building End-to-end Machine Learning Workflows with Kubeflow
Pluralsight Deploying PyTorch Models in Production: PyTorch Playbook
Pluralsight Inside TensorFlow
TensorFlow via YouTube