YoVDO

Extremely Low-Bit Quantization for Transformers - tinyML Asia 2021

Offered By: tinyML via YouTube

Tags

Quantization Courses Embedded Systems Courses Edge Computing Courses Transformer Models Courses Matrix Multiplication Courses Model Compression Courses

Course Description

Overview

Explore extremely low-bit quantization techniques for Transformers in this tinyML Asia 2021 conference talk. Delve into the challenges of deploying Transformer architecture on resource-limited devices and learn about effective quantization strategies. Discover how different Transformer blocks contribute to model accuracy and inference computations, and understand the varying impacts of individual words within embedding blocks. Examine a proposed mixed precision quantization approach for representing Transformer weights using fewer than 3 bits, including a method for assigning different quantization bits to each word in an embedding block based on statistical properties. Gain insights into a novel matrix multiplication kernel that eliminates the need for dequantization steps. Cover topics such as computing system design, uniform quantization schemes, critical problems in quantization, and the Transformer structure. Explore quantization results, latency improvements, and participate in a Q&A session to deepen your understanding of this cutting-edge approach to optimizing Transformer models for mobile and edge devices.

Syllabus

Introduction
Computing system design
Transformer architecture
Uniform quantization
Uniform quantization scheme
Uniform continuation limits
Is it still useful
BCQ
Example
Critical problems
Lookup table
Transformer structure
Quantizing embedding layers
Mixed precision quantization
Encoder and Decoder
Retraining
Quantitation Results
Latency Improvements
Quantization
Q A
Strategic Partners


Taught by

tinyML

Related Courses

TensorFlow Lite for Edge Devices - Tutorial
freeCodeCamp
Few-Shot Learning in Production
HuggingFace via YouTube
TinyML Talks Germany - Neural Network Framework Using Emerging Technologies for Screening Diabetic
tinyML via YouTube
TinyML for All: Full-stack Optimization for Diverse Edge AI Platforms
tinyML via YouTube
TinyML Talks - Software-Hardware Co-design for Tiny AI Systems
tinyML via YouTube