Zero-Copy Model Loading with Ray and PyTorch for Efficient Deep Learning Inference
Offered By: Anyscale via YouTube
Course Description
Overview
Discover how to significantly reduce the cost of loading deep learning models for inference in production environments through zero-copy model loading techniques using PyTorch and Ray. Learn about storing model weights in shared memory for near-instantaneous access across processes, and explore practical code examples demonstrating implementation. Gain insights into the open-source zerocopy library, which simplifies the process of applying zero-copy model loading to PyTorch models with minimal code changes. Examine a benchmark study showcasing the performance benefits of running NLP models with stateless Ray tasks, resulting in a self-tuning model deployment that outperforms traditional Ray Serve deployments. Delve into topics such as model serving basics, loading PyTorch tensors without data copying, and implementing pre- and post-processing with Ray Serve.
Syllabus
Intro
Model Serving 101
Loading PyTorch tensors without copying data
Model inference on Ray using stateless tasks
Summary: Model inference with zero-copy loading
A simple benchmark
Pre- and post-processing with Ray Serve
Benchmark implementation
Benchmark Results
Taught by
Anyscale
Related Courses
Investment Strategies and Portfolio AnalysisRice University via Coursera Advanced R Programming
Johns Hopkins University via Coursera Supply Chain Analytics
Rutgers University via Coursera Технологическое предпринимательство
Moscow Institute of Physics and Technology via Coursera Learn How To Code: Google's Go (golang) Programming Language
Udemy