YoVDO

Zero-Copy Model Loading with Ray and PyTorch for Efficient Deep Learning Inference

Offered By: Anyscale via YouTube

Tags

Deep Learning Courses PyTorch Courses Benchmarking Courses Model Deployment Courses Ray Serve Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Discover how to significantly reduce the cost of loading deep learning models for inference in production environments through zero-copy model loading techniques using PyTorch and Ray. Learn about storing model weights in shared memory for near-instantaneous access across processes, and explore practical code examples demonstrating implementation. Gain insights into the open-source zerocopy library, which simplifies the process of applying zero-copy model loading to PyTorch models with minimal code changes. Examine a benchmark study showcasing the performance benefits of running NLP models with stateless Ray tasks, resulting in a self-tuning model deployment that outperforms traditional Ray Serve deployments. Delve into topics such as model serving basics, loading PyTorch tensors without data copying, and implementing pre- and post-processing with Ray Serve.

Syllabus

Intro
Model Serving 101
Loading PyTorch tensors without copying data
Model inference on Ray using stateless tasks
Summary: Model inference with zero-copy loading
A simple benchmark
Pre- and post-processing with Ray Serve
Benchmark implementation
Benchmark Results


Taught by

Anyscale

Related Courses

Investment Strategies and Portfolio Analysis
Rice University via Coursera
Advanced R Programming
Johns Hopkins University via Coursera
Supply Chain Analytics
Rutgers University via Coursera
Технологическое предпринимательство
Moscow Institute of Physics and Technology via Coursera
Learn How To Code: Google's Go (golang) Programming Language
Udemy