Horovod - Distributed Deep Learning for Reliable MLOps
Offered By: Linux Foundation via YouTube
Course Description
Overview
Syllabus
Intro
Early Adoption of Horovod
Deep Learning Refresher
Distributed Deep Learning
Early Distributed Training - Parameter Servers
Parameter Servers - Tradeoffs
Horovod Technique: Allreduce
Benchmarking
Deep Learning in Research
Deep Learning in Production
Feature Store
Model Training
Preprocessing
Spark ML Pipelines
Petastorm: Data Access for Deep Learning Training Challenges of Training on Large Datasets
Spark 3.0: Resource Aware Scheduling
What if my Spark cluster doesn't have GPUs? Horovod Lambda - Run data processing on CPUs with Spark
Online Prediction
Neuropod: Out-of-Process Execution
Workflow Authoring Can we ideate, define, evaluate and deploy a Deep Learning model all within a single script?
Feature Engineering
Model Construction
Model Deployment
Elastic Horovod: Control Flow
Taught by
Linux Foundation
Tags
Related Courses
Challenges and Opportunities in Applying Machine Learning - Alex Jaimes - ODSC East 2018Open Data Science via YouTube Efficient Distributed Deep Learning Using MXNet
Simons Institute via YouTube Benchmarks and How-Tos for Convolutional Neural Networks on HorovodRunner-Enabled Apache Spark Clusters
Databricks via YouTube SHADE - Enable Fundamental Cacheability for Distributed Deep Learning Training
USENIX via YouTube Alpa - Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
USENIX via YouTube