Building a High Performance Network in the Public Cloud Using RDMA - First Principles
Offered By: Oracle via YouTube
Course Description
Overview
Explore how Oracle Cloud Infrastructure architects utilize Remote Direct Memory Access (RDMA) to deliver high-performance networking with low latency in this 40-minute video from Oracle's First Principles series. Dive into the intricacies of RDMA, its history at OCI, and the challenges it presents. Learn about the importance of RoCE, its pitfalls, and how OCI overcomes them. Discover OCI's approach to QoS tailoring, ECN tuning for various workloads, and the necessity of a separate RDMA network. Gain insights into performance optimizations, including flow-aware traffic distribution and locality optimization. Understand why OCI's RDMA network stands out and how it balances scale and latency for demanding workloads.
Syllabus
Introduction to OCI Cluster Networks
What is RDMA?
History of RDMA at OCI
Why is RDMA Challenging?
Importance of RoCE
Pitfalls of RoCE
Overcoming Pitfalls of RoCE
Limited use of PFC
Tailored QoS for multiple workloads
How to use ECN in RDMA networks
Tuning ECN to HPC workloads
Tuning ECN to GPU and DB workloads
Are OCI Cluster Networks in the same network?
Why do we need a separate RDMA network?
Performance optimizations for workloads
Flow aware traffic distribution
Traffic locality optimization
Traffic topology information vending service
Why OCI RDMA network is better, differentiated
Balancing scale and latency
Taught by
Oracle
Tags
Related Courses
Windows Server 2019: Advanced Networking FeaturesLinkedIn Learning Deep Dive into GPU Support in Apache Spark 3.x - Accelerator-Aware Scheduling and RAPIDS Plugin
Databricks via YouTube Microsecond Consensus for Microsecond Applications
USENIX via YouTube An Edge-Queued Datagram Service for All Datacenter Traffic
USENIX via YouTube Database Consolidation With Persistent Memory
Oracle via YouTube