LLMOps: Quantizing Models and Inference with ONNX Generative Runtime

Offered By: The Machine Learning Engineer via YouTube

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Aprende a instalar el onnx runtime con soporte GPU para realizar inferencia con Modelos Generativos en este tutorial de 39 minutos. Explora el proceso de cuantización utilizando un modelo Phi3-mini-4k a 4int y transforma un Phi3-mini-128k a 4int con el runtime onnx. Sigue paso a paso la implementación práctica utilizando el notebook proporcionado en GitHub para dominar técnicas avanzadas de LLMOps, cuantización de modelos e inferencia con ONNX Generative Runtime. Perfecciona tus habilidades en ciencia de datos y aprendizaje automático con este contenido técnico detallado.

Syllabus

LLMOps: Quantizar modelos e Inferencia con ONNX Generative Runtime #datascience #machinelearning

Taught by

The Machine Learning Engineer

Related Courses

Learning Machine Learning with .NET, PyTorch and the ONNX Runtime
Microsoft via YouTube Using Apache OpenNLP with OpenSearch K-NN Vector Search
Linux Foundation via YouTube Accelerating High-Performance Machine Learning at Scale in Kubernetes
CNCF [Cloud Native Computing Foundation] via YouTube LLMs Fine Tuning and Inferencing Using ONNX Runtime - Workshop
Linux Foundation via YouTube Real-Time Inference of Neural Networks: A Guide for DSP Engineers
ADC - Audio Developer Conference via YouTube