Model Compression Courses

Quant-LLM: Accelerating Large Language Model Serving via FP6-Centric Algorithm-System Co-Design
USENIX via YouTube Implicit Deep Learning - Seminar Series
VinAI via YouTube Efficient AI: From Supercomputers to Smartphones
Scalable Parallel Computing Lab, SPCL @ ETH Zurich via YouTube How to Quantize a Large Language Model with GGUF or AWQ
Trelis Research via YouTube Unlock Faster and More Efficient LLMs with SparseGPT - Neural Magic
Neural Magic via YouTube Pruning and Quantizing ML Models With One Shot Without Retraining
Neural Magic via YouTube Applying Second-Order Pruning Algorithms for SOTA Model Compression
Neural Magic via YouTube Sparse Training of Neural Networks Using AC/DC
Neural Magic via YouTube How Well Do Sparse Models Transfer? - Exploring Transfer Performance in Computer Vision and NLP
Neural Magic via YouTube Leveraging Pruning and Quantization for Efficient Real-Time Audio Applications
ADC - Audio Developer Conference via YouTube

< Prev Page 3 Next >