Serving Large Language Models with KubeRay on TPUs
Offered By: Anyscale via YouTube
Course Description
Overview
Discover how to serve large language models using KubeRay on TPUs in this 25-minute talk from Anyscale. Learn about the technical challenges of serving models with hundreds of billions of parameters and explore how integrating KubeRay with TPUs creates a powerful platform for efficient LLM deployment. Gain insights into the benefits of this approach, including increased performance, improved scalability, reduced costs, enhanced flexibility, and better monitoring capabilities. Understand how KubeRay simplifies Ray cluster management on cloud platforms, while TPUs provide specialized processing power for neural network workloads. Access the accompanying slide deck for visual references and dive deeper into the world of distributed machine learning with Ray, the popular open-source framework for scaling AI workloads.
Syllabus
Serving Large Language Models with KubeRay on TPUs
Taught by
Anyscale
Related Courses
Sailing Ray Workloads with KubeRay and Kueue in KubernetesCNCF [Cloud Native Computing Foundation] via YouTube Accelerate Your GenAI Model Inference with Ray and Kubernetes
CNCF [Cloud Native Computing Foundation] via YouTube Forecasting Covid Infections for the UK's NHS Using Ray and Kubernetes
Anyscale via YouTube KubeRay: A Ray Cluster Management Solution on Kubernetes
Anyscale via YouTube The Different Shades of Using KubeRay with Kubernetes
Anyscale via YouTube