YoVDO

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Offered By: MIT HAN Lab via YouTube

Tags

Model Compression Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the groundbreaking research presented in this 19-minute conference talk video from MLSys 2024, featuring the Best Paper "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration." Delve into the innovative approach developed by researchers from MIT HAN Lab for compressing and accelerating Large Language Models (LLMs). Learn about the Activation-aware Weight Quantization (AWQ) technique and its potential impact on improving the efficiency of LLMs. Gain insights into the methodology, results, and implications of this cutting-edge work in machine learning systems. Access additional resources, including the project website, full paper, and code repository, to further understand and potentially implement the AWQ technique in your own projects.

Syllabus

MLSys'24 Best Paper - AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration


Taught by

MIT HAN Lab

Related Courses

TensorFlow Lite for Edge Devices - Tutorial
freeCodeCamp
Few-Shot Learning in Production
HuggingFace via YouTube
TinyML Talks Germany - Neural Network Framework Using Emerging Technologies for Screening Diabetic
tinyML via YouTube
TinyML for All: Full-stack Optimization for Diverse Edge AI Platforms
tinyML via YouTube
TinyML Talks - Software-Hardware Co-design for Tiny AI Systems
tinyML via YouTube