Accelerated LLM Inference with Anyscale - Ray Summit 2024

Offered By: Anyscale via YouTube

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Explore cutting-edge advancements in LLM inference optimization and scalability in this 30-minute conference talk from Ray Summit 2024. Dive into Anyscale's latest enterprise and production features for accelerated LLM inference, presented by Co-Founder and CTO Philipp Moritz and Cody Yu. Learn about the team's collaborative efforts with the vLLM open-source project, including key improvements such as FP8 support, chunked prefill, multi-step decoding, and speculative decoding. Discover how these optimizations have doubled both throughput and latency efficiency in vLLM. Gain insights into Anyscale-specific enhancements, including custom kernels, batch inference optimizations, and accelerated large model loading for autoscaling deployments. Essential viewing for those interested in state-of-the-art techniques for improving LLM inference efficiency and scalability in enterprise and production environments.

Syllabus

Accelerated LLM Inference with Anyscale | Ray Summit 2024

Taught by

Anyscale

Accelerated LLM Inference with Anyscale - Ray Summit 2024

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Accelerated LLM Inference with Anyscale - Ray Summit 2024

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue