YoVDO

LLMOps: Quantization Models and Inference with ONNX Generative Runtime

Offered By: The Machine Learning Engineer via YouTube

Tags

LLMOps Courses Data Science Courses Machine Learning Courses Quantization Courses Generative Models Courses Model Compression Courses ONNX Runtime Courses Phi-3 Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the world of LLMOps through a 30-minute video focusing on quantization models and inference using ONNX Generative Runtime. Learn how to install ONNX runtime with GPU support and perform inference with a generative model, specifically using a Phi3-mini-4k quantized to 4int. Dive into the process of converting an original Phi3-mini-128k into a 4int quantized version using the ONNX runtime. Access the accompanying notebook on GitHub to follow along and gain hands-on experience in this cutting-edge area of data science and machine learning.

Syllabus

LLMOps: Quantization models & Inference ONNX Generative Runtime #datascience #machinelearning


Taught by

The Machine Learning Engineer

Related Courses

Data Analysis
Johns Hopkins University via Coursera
Computing for Data Analysis
Johns Hopkins University via Coursera
Scientific Computing
University of Washington via Coursera
Introduction to Data Science
University of Washington via Coursera
Web Intelligence and Big Data
Indian Institute of Technology Delhi via Coursera