YoVDO

Accelerated LLM Inference with Anyscale - Ray Summit 2024

Offered By: Anyscale via YouTube

Tags

Anyscale Courses vLLM Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore cutting-edge advancements in LLM inference optimization and scalability in this 30-minute conference talk from Ray Summit 2024. Dive into Anyscale's latest enterprise and production features for accelerated LLM inference, presented by Co-Founder and CTO Philipp Moritz and Cody Yu. Learn about the team's collaborative efforts with the vLLM open-source project, including key improvements such as FP8 support, chunked prefill, multi-step decoding, and speculative decoding. Discover how these optimizations have doubled both throughput and latency efficiency in vLLM. Gain insights into Anyscale-specific enhancements, including custom kernels, batch inference optimizations, and accelerated large model loading for autoscaling deployments. Essential viewing for those interested in state-of-the-art techniques for improving LLM inference efficiency and scalability in enterprise and production environments.

Syllabus

Accelerated LLM Inference with Anyscale | Ray Summit 2024


Taught by

Anyscale

Related Courses

Finetuning, Serving, and Evaluating Large Language Models in the Wild
Open Data Science via YouTube
Cloud Native Sustainable LLM Inference in Action
CNCF [Cloud Native Computing Foundation] via YouTube
Optimizing Kubernetes Cluster Scaling for Advanced Generative Models
Linux Foundation via YouTube
LLaMa for Developers
LinkedIn Learning
Scaling Video Ad Classification Across Millions of Classes with GenAI
Databricks via YouTube