Zero-Copy Model Loading with Ray and PyTorch for Efficient Deep Learning Inference

Offered By: Anyscale via YouTube

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Discover how to significantly reduce the cost of loading deep learning models for inference in production environments through zero-copy model loading techniques using PyTorch and Ray. Learn about storing model weights in shared memory for near-instantaneous access across processes, and explore practical code examples demonstrating implementation. Gain insights into the open-source zerocopy library, which simplifies the process of applying zero-copy model loading to PyTorch models with minimal code changes. Examine a benchmark study showcasing the performance benefits of running NLP models with stateless Ray tasks, resulting in a self-tuning model deployment that outperforms traditional Ray Serve deployments. Delve into topics such as model serving basics, loading PyTorch tensors without data copying, and implementing pre- and post-processing with Ray Serve.

Syllabus

Intro
Model Serving 101
Loading PyTorch tensors without copying data
Model inference on Ray using stateless tasks
Summary: Model inference with zero-copy loading
A simple benchmark
Pre- and post-processing with Ray Serve
Benchmark implementation
Benchmark Results

Taught by

Anyscale

Zero-Copy Model Loading with Ray and PyTorch for Efficient Deep Learning Inference

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Zero-Copy Model Loading with Ray and PyTorch for Efficient Deep Learning Inference

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue