Training and Serving Custom Multi-modal Models - IDEFICS 2 and LLaVA Llama 3
Offered By: Trelis Research via YouTube
Course Description
Overview
Learn to train and serve custom multi-modal models using IDEFICS 2 and LLaVA Llama 3 in this comprehensive tutorial video. Explore the IDEFICS 2 model overview, model loading techniques, and LoRA setup. Evaluate OCR performance and handle multiple image inputs. Dive into the training and fine-tuning process, and review the LLaVA Llama 3 model. Set up a multi-modal inference endpoint and understand VRAM requirements for these advanced models. Discover why IDEFICS 2 is recommended as a foundation for building custom multi-modal applications. Access additional resources, including complete scripts, one-click fine-tuning templates, and community support to enhance your learning experience.
Syllabus
Fine-tuning and server setup for multi-modal models
Prerequisites pre-watching
IDEFICS 2 Model Overview
Model loading, evaluation and LoRA setup
Evaluating OCR performance
Evaluating multiple image inputs
Training / Fine-tuning
LLaVA Llama 3 Model Review
Multi-modal inference endpoint
VRAM Requirements for multi-modal models
IDEFICS 2 - my recommended model to build on
Taught by
Trelis Research
Related Courses
LLaVA: The New Open Access Multimodal AI Model1littlecoder via YouTube Autogen and Local LLMs Create Realistic Stable Diffusion Model Autonomously
kasukanra via YouTube Image Annotation with LLaVA and Ollama
Sam Witteveen via YouTube Unraveling Multimodality with Large Language Models
Linux Foundation via YouTube Efficient and Portable AI/LLM Inference on the Edge Cloud - Workshop
Linux Foundation via YouTube