Lessons Learned from Scaling Large Language Models in Production
Offered By: MLOps World: Machine Learning in Production via YouTube
Course Description
Overview
Explore the challenges and solutions for scaling large language models (LLMs) in production environments during this 40-minute conference talk from MLOps World: Machine Learning in Production. Gain insights from Matt Squire, CTO of Fuzzy Labs, as he shares valuable lessons learned from building and running LLMs at scale for customers. Discover how to overcome the complexities of high-traffic demands, slow LLM inference, and expensive GPU resources. Learn about performance profiling techniques, optimizing GPU utilization, and implementing effective guardrails through real code examples. Understand the nuances of scaling beyond basic RAG applications with open-source models like Mistral, and acquire practical knowledge to enhance your LLM deployment strategies for production-level performance and efficiency.
Syllabus
Lessons learned from scaling large language models in production
Taught by
MLOps World: Machine Learning in Production
Related Courses
Machine Learning Operations (MLOps): Getting StartedGoogle Cloud via Coursera Проектирование и реализация систем машинного обучения
Higher School of Economics via Coursera Demystifying Machine Learning Operations (MLOps)
Pluralsight Machine Learning Engineer with Microsoft Azure
Microsoft via Udacity Machine Learning Engineering for Production (MLOps)
DeepLearning.AI via Coursera