YoVDO

Zero-Copy Model Loading with Ray and PyTorch for Efficient Deep Learning Inference

Offered By: Anyscale via YouTube

Tags

Deep Learning Courses PyTorch Courses Benchmarking Courses Model Deployment Courses Ray Serve Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Discover how to significantly reduce the cost of loading deep learning models for inference in production environments through zero-copy model loading techniques using PyTorch and Ray. Learn about storing model weights in shared memory for near-instantaneous access across processes, and explore practical code examples demonstrating implementation. Gain insights into the open-source zerocopy library, which simplifies the process of applying zero-copy model loading to PyTorch models with minimal code changes. Examine a benchmark study showcasing the performance benefits of running NLP models with stateless Ray tasks, resulting in a self-tuning model deployment that outperforms traditional Ray Serve deployments. Delve into topics such as model serving basics, loading PyTorch tensors without data copying, and implementing pre- and post-processing with Ray Serve.

Syllabus

Intro
Model Serving 101
Loading PyTorch tensors without copying data
Model inference on Ray using stateless tasks
Summary: Model inference with zero-copy loading
A simple benchmark
Pre- and post-processing with Ray Serve
Benchmark implementation
Benchmark Results


Taught by

Anyscale

Related Courses

Deep Learning with Python and PyTorch.
IBM via edX
Introduction to Machine Learning
Duke University via Coursera
How Google does Machine Learning em Português Brasileiro
Google Cloud via Coursera
Intro to Deep Learning with PyTorch
Facebook via Udacity
Secure and Private AI
Facebook via Udacity