YoVDO

Scalable MatMul-free Language Modeling - Paper Explained

Offered By: Yannic Kilcher via YouTube

Tags

Language Models Courses FPGA Courses Quantization Courses Attention Mechanisms Courses Scaling Laws Courses Matrix Multiplication Courses Hardware Acceleration Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a comprehensive video analysis of a research paper proposing MatMul-free language models. Delve into the innovative approach of replacing resource-intensive matrix multiplications with quantization and recurrence techniques to maintain performance while reducing computational costs. Learn about ternary accumulation as a substitute for matrix multiplication, the replacement of attention layers with recurrent layers, and the use of ternary channel mixing instead of dense layers. Examine the language modeling results, scaling laws, and other experimental outcomes presented in the paper. Gain insights into the potential for creating more efficient large language models and the implications for future hardware accelerators designed to process lightweight LLMs.

Syllabus

- Intro
- MatMul is everywhere
- Ternary accumulation as a substitute for matrix multiplication
- Replacing attention layers with recurrent layers
- Replacing dense layers with ternary channel mixing
- Language modelling results & scaling laws
- Other experimental results
- Conclusion


Taught by

Yannic Kilcher

Related Courses

FPGA computing systems: Partial Dynamic Reconfiguration
Politecnico di Milano via Polimi OPEN KNOWLEDGE
Introduction to Amazon Elastic Inference
Pluralsight
FPGA computing systems: Partial Dynamic Reconfiguration
Politecnico di Milano via Coursera
Introduction to Amazon Elastic Inference (Traditional Chinese)
Amazon Web Services via AWS Skill Builder
Introduction to Amazon Elastic Inference (Portuguese)
Amazon Web Services via AWS Skill Builder