The nanoPU - A Nanosecond Network Stack for Datacenters
Offered By: USENIX via YouTube
Course Description
Overview
Explore a groundbreaking NIC-CPU co-design called the nanoPU, aimed at accelerating datacenter applications that rely on numerous small Remote Procedure Calls (RPCs) with microsecond-scale processing times. Delve into the innovative fast path that bypasses the cache and memory hierarchy, directly placing incoming messages into the CPU register file. Discover the programmable hardware support for low-latency transport, congestion control, and efficient RPC load balancing across cores. Learn about the hardware-accelerated thread scheduler that makes sub-nanosecond decisions, optimizing CPU utilization and minimizing RPC tail response times. Examine the FPGA prototype built by modifying an open-source RISC-V CPU and evaluate its performance through cycle-accurate simulations on AWS FPGAs. Compare the nanoPU's wire-to-wire RPC response time of just 69ns to commercial NICs and understand how it significantly improves RPC tail response time and system load sustainability. Investigate the implementation and evaluation of applications like MICA, Raft, and Set Algebra for document retrieval, and learn how the nanoPU serves as a high-performance, programmable alternative for one-sided RDMA operations.
Syllabus
Introduction
Trends
The nanoPU
Prototype
Applications
Onesided RDMA
Conclusion
Taught by
USENIX
Related Courses
Amazon FSx for Lustre Primer (Italian)Amazon Web Services via AWS Skill Builder Amazon FSx for Lustre Primer (Korean)
Amazon Web Services via AWS Skill Builder Amazon FSx for Lustre Primer (Portuguese)
Amazon Web Services via AWS Skill Builder Amazon FSx for Lustre Primer (Spanish)
Amazon Web Services via AWS Skill Builder Amazon FSx for Lustre Primer (Traditional Chinese)
Amazon Web Services via AWS Skill Builder