YoVDO

Enabling Efficient Trillion Parameter Scale Training for Deep Learning Models

Offered By: MLOps.community via YouTube

Tags

Deep Learning Courses Artificial Intelligence Courses Distributed Training Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the challenges and solutions for efficient trillion-parameter scale training in deep learning models in this conference talk from the AI in Production Conference. Delve into DeepSpeed, a deep learning optimization library designed to make distributed model training and inference efficient, effective, and easy on commodity hardware. Learn about training optimizations that improve memory, compute, and data efficiency for extreme model scaling. Gain insights from Olatunji (Tunji) Ruwase, co-founder and lead of the DeepSpeed project at Microsoft, as he shares his expertise in building systems convergence optimizations and frameworks for distributed training and inference of deep learning models.

Syllabus

Enabling Efficient Trillion Parameter Scale Training for Deep Learning Models // Tunji Ruwase


Taught by

MLOps.community

Related Courses

Custom and Distributed Training with TensorFlow
DeepLearning.AI via Coursera
Architecting Production-ready ML Models Using Google Cloud ML Engine
Pluralsight
Building End-to-end Machine Learning Workflows with Kubeflow
Pluralsight
Deploying PyTorch Models in Production: PyTorch Playbook
Pluralsight
Inside TensorFlow
TensorFlow via YouTube