YoVDO

Horovod - Distributed Deep Learning for Reliable MLOps

Offered By: Linux Foundation via YouTube

Tags

MLOps Courses Benchmarking Courses Distributed Deep Learning Courses Horovod Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore distributed deep learning techniques and reliable MLOps practices at Uber in this 30-minute conference talk by Travis Addair. Dive into the early adoption of Horovod, understand distributed deep learning concepts, and compare parameter servers with the Allreduce technique. Examine benchmarking results, learn about deep learning applications in research and production environments, and discover feature stores for efficient model training. Investigate preprocessing techniques, Spark ML pipelines, and Petastorm for data access in deep learning. Address challenges of training on large datasets, explore Spark 3.0's resource-aware scheduling, and learn about Horovod Lambda for CPU-based data processing. Gain insights into online prediction using Neuropod, workflow authoring, and the process of ideating, defining, evaluating, and deploying deep learning models within a single script. Conclude with an overview of feature engineering, model construction, deployment, and Elastic Horovod's control flow capabilities.

Syllabus

Intro
Early Adoption of Horovod
Deep Learning Refresher
Distributed Deep Learning
Early Distributed Training - Parameter Servers
Parameter Servers - Tradeoffs
Horovod Technique: Allreduce
Benchmarking
Deep Learning in Research
Deep Learning in Production
Feature Store
Model Training
Preprocessing
Spark ML Pipelines
Petastorm: Data Access for Deep Learning Training Challenges of Training on Large Datasets
Spark 3.0: Resource Aware Scheduling
What if my Spark cluster doesn't have GPUs? Horovod Lambda - Run data processing on CPUs with Spark
Online Prediction
Neuropod: Out-of-Process Execution
Workflow Authoring Can we ideate, define, evaluate and deploy a Deep Learning model all within a single script?
Feature Engineering
Model Construction
Model Deployment
Elastic Horovod: Control Flow


Taught by

Linux Foundation

Tags

Related Courses

Challenges and Opportunities in Applying Machine Learning - Alex Jaimes - ODSC East 2018
Open Data Science via YouTube
Efficient Distributed Deep Learning Using MXNet
Simons Institute via YouTube
Benchmarks and How-Tos for Convolutional Neural Networks on HorovodRunner-Enabled Apache Spark Clusters
Databricks via YouTube
SHADE - Enable Fundamental Cacheability for Distributed Deep Learning Training
USENIX via YouTube
Alpa - Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
USENIX via YouTube