YoVDO

Resource-Aware Scheduling for Production GenAI with RAG on Multicluster Cloud Kubernetes

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Kubernetes Courses Cloud Computing Courses Vector Databases Courses Load Balancing Courses Retrieval Augmented Generation Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a comprehensive approach to resource-aware scheduling for production GenAI with Retrieval-Augmented Generation (RAG) in a multicluster cloud Kubernetes environment. Dive into the advantages of self-hosting GenAI models, including improved control, privacy, performance, and cost-effectiveness. Learn how Kubernetes cloud resource management provides a flexible hosting platform for these systems. Discover the proposed architecture utilizing multiple Kubernetes clusters and a resource-aware policy-based cluster scheduler. Examine the key components of this setup, including vector databases for RAG contexts, load-balanced query services, prediction services for model execution, and ingestion services for vector database updates. Understand the benefits of using a cloud-native multi-region scalable vector database and running services across different Kubernetes clusters. Gain insights into the geographical distribution of CPU and GPU clusters for optimal reliability, latency, and resource availability. Explore the role of the cluster scheduler in placement and scaling decisions. Analyze the benefits of this approach and learn about a reference implementation to help you apply these concepts in your own GenAI projects.

Syllabus

Resource-Aware Scheduling for Production GenAI with RAG running on Multicluster Cloud Kubernetes


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

Designing Highly Scalable Web Apps on Google Cloud Platform
Google via Coursera
Google Cloud Platform for AWS Professionals
Google via Coursera
Elastic Google Cloud Infrastructure: Scaling and Automation
Google Cloud via Coursera
Windows Server 2016: Advanced Virtualization
Microsoft via edX
Elastic Cloud Infrastructure: Scaling and Automation 日本語版
Google Cloud via Coursera