Scalable MatMul-free Language Modeling - Paper Explained
Offered By: Yannic Kilcher via YouTube
Course Description
Overview
Explore a comprehensive video analysis of a research paper proposing MatMul-free language models. Delve into the innovative approach of replacing resource-intensive matrix multiplications with quantization and recurrence techniques to maintain performance while reducing computational costs. Learn about ternary accumulation as a substitute for matrix multiplication, the replacement of attention layers with recurrent layers, and the use of ternary channel mixing instead of dense layers. Examine the language modeling results, scaling laws, and other experimental outcomes presented in the paper. Gain insights into the potential for creating more efficient large language models and the implications for future hardware accelerators designed to process lightweight LLMs.
Syllabus
- Intro
- MatMul is everywhere
- Ternary accumulation as a substitute for matrix multiplication
- Replacing attention layers with recurrent layers
- Replacing dense layers with ternary channel mixing
- Language modelling results & scaling laws
- Other experimental results
- Conclusion
Taught by
Yannic Kilcher
Related Courses
Introduction To Mechanical Micro MachiningIndian Institute of Technology, Kharagpur via Swayam Biomaterials - Intro to Biomedical Engineering
Udemy OpenAI Whisper - Robust Speech Recognition via Large-Scale Weak Supervision
Aleksa Gordić - The AI Epiphany via YouTube Turbulence as Gibbs Statistics of Vortex Sheets - Alexander Migdal
Institute for Advanced Study via YouTube City Analytics - Professor Peter Grindrod CBE
Alan Turing Institute via YouTube