YoVDO

Databricks' vLLM Optimization for Cost-Effective LLM Inference - Ray Summit 2024

Offered By: Anyscale via YouTube

Tags

vLLM Courses Machine Learning Courses Databricks Courses Quantization Courses Benchmarking Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore Databricks' innovative approach to optimizing vLLM for enhanced LLM inference performance in this Ray Summit 2024 presentation. Discover how Megha Agarwal and her team at Databricks (MosaicML) tackle the challenges of GPU blocking operations during decoding steps, which can significantly impact performance for large models. Learn about their solutions to reduce GPU idle time and accelerate quantization using custom kernels. Gain valuable insights into future optimization areas and best practices for benchmarking LLM deployments. Ideal for organizations and developers working on large-scale LLM projects, this talk offers practical strategies to improve inference efficiency and reduce costs in LLM serving products.

Syllabus

Databricks' vLLM Optimization for Cost-Effective LLM Inference | Ray Summit 2024


Taught by

Anyscale

Related Courses

Data Processing with Azure
LearnQuest via Coursera
Mejores prácticas para el procesamiento de datos en Big Data
Coursera Project Network via Coursera
Data Science with Databricks for Data Analysts
Databricks via Coursera
Azure Data Engineer con Databricks y Azure Data Factory
Coursera Project Network via Coursera
Curso Completo de Spark con Databricks (Big Data)
Coursera Project Network via Coursera