Optimizing LLM Inference with AWS Trainium, Ray, vLLM, and Anyscale
Offered By: Anyscale via YouTube
Course Description
Overview
Discover how to optimize large language model (LLM) inference using AWS Trainium, Ray, vLLM, and Anyscale in this 46-minute webinar. Learn to scale and productionize LLM workloads cost-effectively by leveraging AWS accelerator instances, including AWS Inferentia, for reliable LLM serving at scale. Explore building a complete LLM inference stack using vLLM and Ray on Amazon EKS, and understand Anyscale's performance and enterprise capabilities for ambitious LLM and GenAI inference workloads. Gain insights into using AWS Inferentia accelerators for leading price-performance, leveraging AWS compute instances on Anyscale for optimized LLM inference, and utilizing Anyscale's managed enterprise LLM inference offering with advanced cluster management optimizations. Ideal for AI Engineers seeking to operationalize generative AI models at scale cost-efficiently and Infrastructure Engineers planning to support GenAI use cases and LLM inference in their organizations.
Syllabus
Optimizing LLM Inference with AWS Trainium, Ray, vLLM, and Anyscale
Taught by
Anyscale
Related Courses
Finetuning, Serving, and Evaluating Large Language Models in the WildOpen Data Science via YouTube Cloud Native Sustainable LLM Inference in Action
CNCF [Cloud Native Computing Foundation] via YouTube Optimizing Kubernetes Cluster Scaling for Advanced Generative Models
Linux Foundation via YouTube LLaMa for Developers
LinkedIn Learning Scaling Video Ad Classification Across Millions of Classes with GenAI
Databricks via YouTube