YoVDO

Characterization of Large Language Model Development in Datacenters

Offered By: USENIX via YouTube

Tags

Deep Learning Courses Performance Evaluation Courses Fault Tolerance Courses Cluster Management Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore an in-depth characterization study of Large Language Model (LLM) development in datacenters through this 17-minute conference talk from NSDI '24. Delve into the challenges and opportunities of efficiently utilizing large-scale cluster resources for LLM development, including hardware failures, parallelization strategies, and resource utilization. Examine the differences between LLMs and traditional task-specific Deep Learning workloads, and discover potential optimizations for LLM-tailored systems. Learn about innovative approaches such as fault-tolerant pretraining and decoupled scheduling for evaluation, designed to enhance fault tolerance and achieve timely performance feedback in LLM development environments.

Syllabus

NSDI '24 - Characterization of Large Language Model Development in the Datacenter


Taught by

USENIX

Related Courses

Observing and Analysing Performance in Sport
OpenLearning
Introduction aux réseaux mobiles
Institut Mines-Télécom via France Université Numerique
Claves para Gestionar Personas
IESE Business School via Coursera
الأجهزة الطبية في غرف العمليات والعناية المركزة
Rwaq (رواق)
Clinical Supervision with Confidence
University of East Anglia via FutureLearn