YoVDO

Training Large Language Models on Kubernetes

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Kubernetes Courses Machine Learning Courses High Performance Computing Courses GPU Computing Courses Scalability Courses Distributed Computing Courses Model Training Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the challenges and best practices of training Large Language Models (LLMs) on Kubernetes in this informative conference talk. Discover how to optimize networking, manage distributed resources, schedule effectively, and manipulate code for LLM training on K8s. Learn about pre-made configurations, data pre-processing workflows, and training setups based on NVIDIA's Megatron Transformer framework to quickly start LLM training on Kubernetes. Compare training throughput between bare metal and K8s-based environments for models like GPT, T5, and BERT across various GPU node configurations. Gain insights into the massive computational requirements of LLMs and how Kubernetes can be leveraged for their training, as opposed to traditional bare metal servers with high-performance computing workload schedulers like Slurm.

Syllabus

Training Large Language Models on Kubernetes - Ronen Dar, Run:ai


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

Financial Sustainability: The Numbers side of Social Enterprise
+Acumen via NovoEd
Cloud Computing Concepts: Part 2
University of Illinois at Urbana-Champaign via Coursera
Developing Repeatable ModelsĀ® to Scale Your Impact
+Acumen via Independent
Managing Microsoft Windows Server Active Directory Domain Services
Microsoft via edX
Introduction aux conteneurs
Microsoft Virtual Academy via OpenClassrooms