Fine-tuning Pixtral - Multi-modal Vision and Text Model
Offered By: Trelis Research via YouTube
Course Description
Overview
Syllabus
How to fine-tune Pixtral.
Video Overview
Pixtral architecture and design choices
Mistral’s custom image encoder - trained from scratch
Fine-tuning Pixtral in a Jupyter notebook
GPU setup for notebook fine-tuning and VRAM requirements
Getting a “transformers” version of Pixtral for fine-tuning
Loading Pixtral
Dataset loading and preparation
Chat templating somewhat advanced, but recommended
Inspecting and evaluating baseline performance on the custom data
Setting up data collation including for multi-turn training.
Training on completions only tricky but improves performance
Setting up LoRA fine-tuning
Setting up training arguments batch size, learning rate, gradient checkpointing
Setting up tensor board
Evaluating the trained model
Merging LoRA adapters and pushing the model to hub
Measuring performance on OCR optical character recognition
Inferencing Pixtral with vLLM, setting up an API endpoint
Video resources
Taught by
Trelis Research
Related Courses
Introduction to Artificial IntelligenceStanford University via Udacity Natural Language Processing
Columbia University via Coursera Probabilistic Graphical Models 1: Representation
Stanford University via Coursera Computer Vision: The Fundamentals
University of California, Berkeley via Coursera Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent