Scaling Inference Deployments with NVIDIA Triton Inference Server and Ray Serve
Offered By: Anyscale via YouTube
Course Description
Overview
Explore the collaboration between Ray Serve and NVIDIA Triton Inference Server in this conference talk from Ray Summit 2024. Learn about the new Python API for Triton Inference Server and its seamless integration with Ray Serve applications. Discover how this partnership enhances capabilities for scaling inference deployments, combining the strengths of both open-source platforms. Gain insights into improving ML model performance through a stable diffusion demo and understand the benefits of utilizing Triton's advanced optimization tools like Performance and Model Analyzer. See how to fine-tune model configurations based on specific throughput and latency requirements, empowering you to optimize your inference deployments effectively.
Syllabus
Scaling Inference Deployments with NVIDIA Triton Inference Server and Ray Serve | Ray Summit 2024
Taught by
Anyscale
Related Courses
Patterns of ML Models in ProductionPyCon US via YouTube Deploying Many Models Efficiently with Ray Serve
Anyscale via YouTube Modernizing DoorDash Model Serving Platform with Ray Serve
Anyscale via YouTube Ray for Large-Scale Time-Series Energy Forecasting to Plan a More Resilient Power Grid
Anyscale via YouTube Enabling Cost-Efficient LLM Serving with Ray Serve
Anyscale via YouTube