YoVDO

Kubernetes Deep Dive - Elevating ML Workload Monitoring to Art

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Kubernetes Courses Machine Learning Courses Grafana Courses Prometheus Courses Observability Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Dive deep into the art of monitoring ML workloads on Kubernetes in this comprehensive conference talk. Explore strategies for optimizing AI/ML workloads, combining node health assurance with advanced monitoring techniques. Learn about AWS Neuron's integration for problem detection and the deployment of Neuron Monitor for enhanced observability. Discover how to diagnose and resolve real-world issues in AI/ML clusters using robust detection and recovery mechanisms. Gain insights on leveraging tools such as Kubernetes node problem detector, Prometheus, Grafana, and AWS CloudWatch for in-depth performance analytics. Empower yourself with the knowledge to ensure resilient and transparent Kubernetes environments for AI/ML applications.

Syllabus

Kubernetes Deep Dive: Elevating ML Workload Monitoring to Art - Ziwen Ning & Geeta Gharpure


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

Kubernetes Hands-On - Deploy Microservices to the AWS Cloud
Udemy
Learn DevOps: Advanced Kubernetes Usage
Udemy
Monitoring & Telemetry for Production Systems
Coursera Project Network via Coursera
Kubernetes: Cloud Native Ecosystem
LinkedIn Learning
Kubernetes: Monitoring with Prometheus
LinkedIn Learning