YoVDO

LLMOps: Quantization Models and Inference with ONNX Generative Runtime

Offered By: The Machine Learning Engineer via YouTube

Tags

LLMOps Courses Data Science Courses Machine Learning Courses Quantization Courses Generative Models Courses Model Compression Courses ONNX Runtime Courses Phi-3 Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the world of LLMOps through a 30-minute video focusing on quantization models and inference using ONNX Generative Runtime. Learn how to install ONNX runtime with GPU support and perform inference with a generative model, specifically using a Phi3-mini-4k quantized to 4int. Dive into the process of converting an original Phi3-mini-128k into a 4int quantized version using the ONNX runtime. Access the accompanying notebook on GitHub to follow along and gain hands-on experience in this cutting-edge area of data science and machine learning.

Syllabus

LLMOps: Quantization models & Inference ONNX Generative Runtime #datascience #machinelearning


Taught by

The Machine Learning Engineer

Related Courses

Digital Signal Processing
École Polytechnique Fédérale de Lausanne via Coursera
Principles of Communication Systems - I
Indian Institute of Technology Kanpur via Swayam
Digital Signal Processing 2: Filtering
École Polytechnique Fédérale de Lausanne via Coursera
Digital Signal Processing 3: Analog vs Digital
École Polytechnique Fédérale de Lausanne via Coursera
Digital Signal Processing 4: Applications
École Polytechnique Fédérale de Lausanne via Coursera