YoVDO

Structured Quantization for Neural Network Language Model Compression

Offered By: tinyML via YouTube

Tags

Language Models Courses Speech Recognition Courses Word Embeddings Courses

Course Description

Overview

Explore a 32-minute conference talk from tinyML Asia 2020 focusing on structured quantization techniques for neural network language model compression. Delve into the challenges of large memory consumption in resource-constrained scenarios and discover how advanced structured quantization methods can achieve high compression ratios of 70-100 without compromising performance. Learn about various compression approaches, including pruning, fixed-point quantization, product quantization, and binarization. Examine the impact on speech recognition performance and compare results with full precision models. Gain insights into the application of these techniques to word embeddings and neural network architectures in the context of natural language processing and speech recognition.

Syllabus

Introduction
Neural network vs NLP
Language model
Memory
Neural Network
Word Embedding
Neural Network Size
General Approach
Pruning
Quantization based approaches
Fixed point quantization
Product quantization
Speed recognition performance
Binarization
Embedding Matrix
Full Precision Model
Two Methods
Results
Conclusion
Question
Sponsors


Taught by

tinyML

Related Courses

Machine Learning Capstone: An Intelligent Application with Deep Learning
University of Washington via Coursera
Elaborazione del linguaggio naturale
University of Naples Federico II via Federica
Deep Learning for Natural Language Processing
University of Oxford via Independent
Deep Learning Summer School
Independent
Sequence Models
DeepLearning.AI via Coursera