YoVDO

Quantization Fundamentals with Hugging Face

Offered By: DeepLearning.AI via Coursera

Tags

Quantization Courses Generative AI Courses Model Compression Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Generative AI models, like large language models, often exceed the capabilities of consumer-grade hardware and are expensive to run. Compressing models through methods such as quantization makes them more efficient, faster, and accessible. This allows them to run on a wide variety of devices, including smartphones, personal computers, and edge devices, and minimizes performance degradation. Join this course to: 1. Quantize any open source model with linear quantization using the Quanto library. 2. Get an overview of how linear quantization is implemented. This form of quantization can be applied to compress any model, including LLMs, vision models, etc. 3. Apply “downcasting,” another form of quantization, with the Transformers library, which enables you to load models in about half their normal size in the BFloat16 data type. By the end of this course, you will have a foundation in quantization techniques and be able to apply them to compress and optimize your own generative AI models, making them more accessible and efficient.

Syllabus

  • Quantization Fundamentals with Hugging Face
    • Generative AI models, like large language models, often exceed the capabilities of consumer-grade hardware and are expensive to run. Compressing models through methods such as quantization makes them more efficient, faster, and accessible. This allows them to run on a wide variety of devices, including smartphones, personal computers, and edge devices, and minimizes performance degradation. Join this course to: 1. Quantize any open source model with linear quantization using the Quanto library. 2. Get an overview of how linear quantization is implemented. This form of quantization can be applied to compress any model, including LLMs, vision models, etc. 3. Apply “downcasting,” another form of quantization, with the Transformers library, which enables you to load models in about half their normal size in the BFloat16 data type. By the end of this course, you will have a foundation in quantization techniques and be able to apply them to compress and optimize your own generative AI models, making them more accessible and efficient.

Taught by

Younes Belkada and Marc Sun

Related Courses

Bayes Classifier on Dataproc
Google via Google Cloud Skills Boost
Llama for Python Programmers
University of Michigan via Coursera
Quantization in Depth
DeepLearning.AI via Coursera
Working with Llama 3
DataCamp
Digital Signal Processing
École Polytechnique Fédérale de Lausanne via Coursera