YoVDO

Enabling Cost-Efficient LLM Serving with Ray Serve

Offered By: Anyscale via YouTube

Tags

Distributed Systems Courses Autoscaling Courses Cost Optimization Courses Hugging Face Courses Ray Serve Courses vLLM Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Discover how Ray Serve enables cost-efficient Large Language Model (LLM) serving in this 30-minute conference talk by Anyscale. Explore the capabilities of Ray Serve as the most economical and straightforward method for deploying LLMs, having processed billions of tokens in Anyscale Endpoints. Delve into the cost-reduction strategies employed by Ray Serve, including fine-grained autoscaling, continuous batching, and model parallel inference. Gain insights into the efforts made to simplify the deployment of any Hugging Face model with these optimizations. Learn how Ray Serve minimizes costs by utilizing fewer GPUs through fine-grained autoscaling and integrating with libraries like VLLM to maximize GPU utilization. Understand the significance of Ray as the leading open-source framework for scaling and productionizing AI workloads, powering the world's most ambitious AI projects across various domains.

Syllabus

Enabling Cost-Efficient LLM Serving with Ray Serve


Taught by

Anyscale

Related Courses

Hugging Face on Azure - Partnership and Solutions Announcement
Microsoft via YouTube
Question Answering in Azure AI - Custom and Prebuilt Solutions - Episode 49
Microsoft via YouTube
Open Source Platforms for MLOps
Duke University via Coursera
Masked Language Modelling - Retraining BERT with Hugging Face Trainer - Coding Tutorial
rupert ai via YouTube
Masked Language Modelling with Hugging Face - Microsoft Sentence Completion - Coding Tutorial
rupert ai via YouTube