LLMOps: Quantization Models and Inference with ONNX Generative Runtime
Offered By: The Machine Learning Engineer via YouTube
Course Description
Overview
Explore the world of LLMOps through a 30-minute video focusing on quantization models and inference using ONNX Generative Runtime. Learn how to install ONNX runtime with GPU support and perform inference with a generative model, specifically using a Phi3-mini-4k quantized to 4int. Dive into the process of converting an original Phi3-mini-128k into a 4int quantized version using the ONNX runtime. Access the accompanying notebook on GitHub to follow along and gain hands-on experience in this cutting-edge area of data science and machine learning.
Syllabus
LLMOps: Quantization models & Inference ONNX Generative Runtime #datascience #machinelearning
Taught by
The Machine Learning Engineer
Related Courses
Fine-tuning Phi-3 for LeetCode: Dataset Generation and Unsloth ImplementationAll About AI via YouTube LLM News: GPT-4, Project Astra, Veo, Copilot+ PCs, Gemini 1.5 Flash, and Chameleon
Elvis Saravia via YouTube LLM Tool Use - GPT4o-mini, Groq, and Llama.cpp
Trelis Research via YouTube LoRA Fine-tuning Explained - Choosing Parameters and Optimizations
Trelis Research via YouTube Comparing LLAMA 3, Phi 3, and GPT-3.5 Turbo AI Agents for Web Search Performance
Data Centric via YouTube