AI Inference Workloads - Solving MLOps Challenges in Production
Offered By: Toronto Machine Learning Series (TMLS) via YouTube
Course Description
Overview
Explore the challenges and solutions for AI inference workloads in production environments during this 55-minute conference talk from the Toronto Machine Learning Series. Dive into the complexities of moving machine learning prototypes to production, focusing on throughput, latency, and GPU utilization. Learn about fractional GPU capabilities and their impact on performance. Discover how a leading organization built an inference platform using Kubernetes and NVIDIA A100 MIG technology to scale AI initiatives. Gain insights into deployment types for inference workloads, embedding ML models into web servers, and decoupling web and model serving. Understand the concept of Multi-Instance GPU (MIG) and its applications in model inferencing. Benefit from the speaker's expertise in DevOps, Cloud Computing, Kubernetes, and AI computing to overcome MLOps challenges and optimize your AI inference workflows.
Syllabus
Intro
Agenda
The Machine Learning Process
Deployment Types for Inference Workloads
Machine Learning is Different than Traditional Software Engineering
Low Latency
High Throughput
Maximize GPU Utilization
Embedding ML. Models into Web Servers
Decouple Web Serving and Model Serving
Model Serving System on Kubernetes
Multi-Instance GPU (MIG)
Run:Al's Dynamic MIG Allocations
Run 3 instances of type 2g.10gb
Valid Profiles & Configurations
Serving on Fractional GPUs
A Game Changer for Model Inferencing
Taught by
Toronto Machine Learning Series (TMLS)
Related Courses
Machine Learning Operations (MLOps): Getting StartedGoogle Cloud via Coursera Проектирование и реализация систем машинного обучения
Higher School of Economics via Coursera Demystifying Machine Learning Operations (MLOps)
Pluralsight Machine Learning Engineer with Microsoft Azure
Microsoft via Udacity Machine Learning Engineering for Production (MLOps)
DeepLearning.AI via Coursera