YoVDO

Twine - A Unified Cluster Management System for Shared Infrastructure

Offered By: USENIX via YouTube

Tags

OSDI (Operating Systems Design and Implementation) Courses Cluster Management Courses Data Center Management Courses Twine Courses

Course Description

Overview

Explore a comprehensive presentation on Twine, Facebook's innovative cluster management system designed for shared infrastructure. Delve into the system's unique approach to managing one million machines across multiple data centers in a geographic region through a single control plane. Learn about the TaskControl API that enables application-specific customization, and discover how host profiles are utilized to optimize hardware and OS settings for diverse workloads. Understand the rationale behind Facebook's decision to deploy power-efficient small machines universally and leverage autoscaling for improved utilization. Gain insights into the challenges and solutions involved in migrating workloads to shared infrastructure, and examine the lessons learned from implementing this large-scale system. Compare Twine's approach to conventional practices and explore its impact on performance, efficiency, and resource management in data centers.

Syllabus

Intro
Data center geographic regions
What design decisions did Twine make differently?
What if we used Kubernetes?
How does Twine avoid stranded capacity?
How does Twine perform fleet-wide optimization?
How does Twine perform fleet-wide optimization fo. entire geographic region?
How well does the Twine scheduler scale?
How do we mitigate risks with 1M machines per deployment?
Private pools or shared infrastructure?
What is host customization?
What is the overhead for host profile switches?
What drives host profile changes?
What are the challenges with supporting ubiquitous shared infrastructure?
Challenge: Tasks are not homogenous
How does Twine collaborate with applications?
What is our shared infrastructure adoption?
How easy is it to migrate onto shared infrastructure.
Power is our most constrained resource
Big machines or small machines?
Why use small machines?
How much do we save by using small machines?
What lessons did we learn using small machines?
Conclusion


Taught by

USENIX

Related Courses

Adobe Experience Manager and MongoDB
MongoDB University
Elastic Cloud Infrastructure: Containers and Services auf Deutsch
Google Cloud via Coursera
Architecting with Google Kubernetes Engine: Foundations en Français
Google Cloud via Coursera
Kubernetes Hands-On - Deploy Microservices to the AWS Cloud
Udemy
Docker Swarm: BEGINNER + ADVANCED
Udemy