Climbing the Summit with Open Source and POWER9
Offered By: linux.conf.au via YouTube
Course Description
Overview
Syllabus
Intro
maybe even the fastest in the world?
Who wants these machines?
OAK IBM POWER
Intel x86
Summit: Science research Astrophysics Materials Cancer Research Systems Biology
Titan 2012 27
Metric household
Summit: 13 MegaWatts
Summit: USD $200 Million
550 households
1 Sydney house
Summit: 300 km of cables
Sierra: National Nuclear Security Administration's Stockpile Stewardship Mission
How do you build this thing?
IBM 2 computers: • Infrastructure • Compute
POWER8 based?
100Gbps Networking
Mellanox CX-5
Hybrid approach CPUS + GPUS
Compute: Witherspoon AC922
How do we build them?
Timelines?
Sierra release: December 2017
Infrastructure nodes are first
Linux • Firmware • Systems • GPU interfaces
24 Core SMT4
8 Billion transistors
POWER9 is major refresh POWER
Major Architectural changes: • Radix/Linux Based MMU • New interrupt controller • Direct attach DDR4 DIMMs
New Slice Microarchitecture
First through 14nm fab
POWER9 chip development
Minor releases too
DD1: January 2017
Planning for Linux and Firmware
Design: Radix MMU
Radix MMU: • Simpler • Better performance • KVM allocations
Simulation: • Functional • Cycle Accurate
Teach Linux basic feature
Bringup: Everything is broken
Get Linux up
Bringup: • Identify issues • Work around • Get out of the way • Find real fix
Develop items that need real hardware
Testing • More systems • Systems getting more sophisticated • Devs - Machines futher separated
Release: Yay!
Staged release
POWER9 not backwards compatible with POWERS
IBM - RedHat strong relationship
IBM & RedHat partnered on RHEL7 for POWERS
Deliver Linux to customers
End of Moore's Law
Drive accelerators
Binary Linux kernel driver
Helped prove out: • Link training • Firmware
Coherent memory
CUDA Unified memory
Design • IOMMU looks like PCle ATS • IOMMU directly uses Radix MMU
Simulation with P9
Bringup: March 2017
Testing: Data integrity
Baseboard Management Controller
Little computer that turns on your big computer
Firmware?
Infrastructure nodes: Supermicro based BMC
Compute node OpenBMC
Compute nodes first OpenBMC release
Like a distro
Features: • On/Off • Monitor
Solutions
Pervasive
So how did it end up?
Fastest computer in the world?
Taught by
linux.conf.au
Related Courses
Security Principles(ISC)² via Coursera Emergency and Disaster Training and Exercising: An Introduction
Coventry University via FutureLearn A General Approach to Risk Management
University System of Georgia via Coursera A Strategic Approach to Cybersecurity
University of Maryland, College Park via Coursera Academia de auditoría en la nube: independencia en la nube (Español LATAM) | Cloud Audit Academy - Cloud Agnostic (Spanish from Latin America)
Amazon Web Services via AWS Skill Builder