YoVDO

A Chiplet-Based Generative Inference Architecture with Block Floating Point Datatypes

Offered By: Scalable Parallel Computing Lab, SPCL @ ETH Zurich via YouTube

Tags

Transformer Models Courses PyTorch Courses Quantization Courses Deep Reinforcement Learning Courses Chiplets Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a comprehensive conference talk on chiplet-based generative inference architecture and block floating point datatypes for AI acceleration. Delve into modular, spatial CGRA-like architectures optimized for generative inference, and learn about deep RL-based mappers in compilers for spatial and temporal architectures. Discover weight and activation quantization techniques in block floating point formats, building upon GPTQ and SmoothQuant, and their implementation in PyTorch. Examine an extension to EL-attention for reducing KV cache size and bandwidth. Gain insights from speaker Sudeep Bhoja in this SPCL_Bcast #38 recording from ETH Zurich's Scalable Parallel Computing Lab, featuring an in-depth presentation followed by announcements and a Q&A session.

Syllabus

Introduction
Talk
Announcements
Q&A Session


Taught by

Scalable Parallel Computing Lab, SPCL @ ETH Zurich

Related Courses

Evaluating Chiplet-based Large-Scale Interconnection Networks via Cycle-Accurate Packet-Parallel Simulation
USENIX via YouTube
Post-Moore Spatial Computing: From Chips to Clusters
Scalable Parallel Computing Lab, SPCL @ ETH Zurich via YouTube
HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement
Scalable Parallel Computing Lab, SPCL @ ETH Zurich via YouTube
Berkeley Lab's Breakthroughs in Exascale Supercomputing and AI Energy Efficiency
SAIConference via YouTube