LLMOps: OpenVino Toolkit Quantization 4int LLama 3.2 3B and Inference on CPU

Offered By: The Machine Learning Engineer via YouTube

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Learn how to convert the LLAMA3.2 3 Billion parameter model to OpenVino IR format and quantize it to 4-bit integer precision. Follow along as the process of model conversion and quantization is demonstrated step-by-step. Discover how to perform inference on a CPU using Chain of Thought (CoT) prompts with the optimized model. Access the accompanying Jupyter notebook for hands-on practice and deeper understanding of the LLMOps techniques covered in this 26-minute tutorial on data science and machine learning.

Syllabus

LLMOps: OpenVino Toolkit quantization 4int LLama3.2 3B, Inference CPU #datascience #machinelearning

Taught by

The Machine Learning Engineer

Related Courses

LLaMA- Open and Efficient Foundation Language Models - Paper Explained
Yannic Kilcher via YouTube Alpaca & LLaMA - Can it Compete with ChatGPT?
Venelin Valkov via YouTube Experimenting with Alpaca & LLaMA
Aladdin Persson via YouTube What's LLaMA? ChatLLaMA? - And Some ChatGPT/InstructGPT
Aladdin Persson via YouTube Llama Index - Step by Step Introduction
echohive via YouTube