Thunderbolt - Throughput-Optimized, Quality-of-Service-Aware Power Capping at Scale
Offered By: USENIX via YouTube
Course Description
Overview
Explore a conference talk on Thunderbolt, a hardware-agnostic power capping system designed for hyperscale data centers. Learn about the challenges of power oversubscription and the need for task-level quality-of-service differentiation in modern compute clusters. Discover how Thunderbolt ensures safe power oversubscription while minimizing impact on both throughput-oriented and latency-sensitive tasks. Examine the system's architecture, mechanisms, and policies, including its two-threshold control policy and use of CPU bandwidth control. Understand the benefits of Thunderbolt's reactive and proactive capping approaches, and see real-world deployment results in production clusters. Gain insights into power efficiency improvements and the potential for significant power oversubscription gains in data center environments.
Syllabus
Intro
Motivation: power oversubscription and capping
Motivation: task QoS differentiation
Prior industry solutions did not meet our needs
Architecture
Mechanism and policy details
Why not RAPL or DVFS?
CPU bandwidth control, DVFS, RAPL on Intel Skylake CPU
Reactive capping policy: load shaping
Load shaping on a production cluster
Proactive capping mechanism: CPU jailing Deterministic machine CPU cap
20% CPU jailing on a production cluster
Proactive capping policy: risk assessment
Deployed in logs processing clusters
Summary
Taught by
USENIX
Related Courses
Teaching Impacts of Technology: FundamentalsUniversity of California, San Diego via Coursera Microsoft Azure Services and Concepts
Pluralsight VirtualizaciĆ³n con VMware aplicada al mundo empresarial
Udemy Cloud Deployment Options: Executive Briefing
Pluralsight Designing Storage Networking for Cisco Data Center Infrastructure
Pluralsight