YoVDO

Removing Obstacles before Breaking Through the Memory Wall - A Close Look at HBM Errors in the Field

Offered By: USENIX via YouTube

Tags

Data Centers Courses DRAM Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a groundbreaking study on High-bandwidth memory (HBM) errors in this 21-minute conference talk from USENIX ATC '24. Delve into the first systematic analysis of HBM errors, covering over 460 million error events collected from nineteen data centers over a two-year period. Discover how HBM's stacked architecture, while promising for overcoming the memory wall, introduces new reliability challenges. Learn about the unique error patterns exhibited by HBM compared to conventional DRAM, including differences in spatial locality, temporal correlation, and sensor metrics. Understand why traditional DRAM error prediction models fall short for HBM. Gain insights into Calchas, a novel hierarchical failure prediction framework designed specifically for HBM, which integrates spatial, temporal, and sensor information from various device levels. Examine the feasibility of failure prediction across hierarchical levels and the implications for future memory technologies.

Syllabus

USENIX ATC '24 - Removing Obstacles before Breaking Through the Memory Wall: A Close Look at HBM...


Taught by

USENIX

Related Courses

SIGCOMM 2020 - TEA - Enabling State Intensive Network Functions on Programmable Switches
Association for Computing Machinery (ACM) via YouTube
Cold Boot Attack on DDR2 and DDR3 RAM
nullcon via YouTube
Exploring the Design Space of Page Management for Multi-Tiered Memory Systems
USENIX via YouTube
On-Chip Randomization for Memory Protection Against Hardware Supply Chain Attacks to DRAM
IEEE via YouTube
CSI - Rowhammer - Closing the Case of Half-Double and Beyond
Black Hat via YouTube