YoVDO

Training and Serving Custom Multi-modal Models - IDEFICS 2 and LLaVA Llama 3

Offered By: Trelis Research via YouTube

Tags

LoRA (Low-Rank Adaptation) Courses Fine-Tuning Courses Hugging Face Courses LLaVA Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn to train and serve custom multi-modal models using IDEFICS 2 and LLaVA Llama 3 in this comprehensive tutorial video. Explore the IDEFICS 2 model overview, model loading techniques, and LoRA setup. Evaluate OCR performance and handle multiple image inputs. Dive into the training and fine-tuning process, and review the LLaVA Llama 3 model. Set up a multi-modal inference endpoint and understand VRAM requirements for these advanced models. Discover why IDEFICS 2 is recommended as a foundation for building custom multi-modal applications. Access additional resources, including complete scripts, one-click fine-tuning templates, and community support to enhance your learning experience.

Syllabus

Fine-tuning and server setup for multi-modal models
Prerequisites pre-watching
IDEFICS 2 Model Overview
Model loading, evaluation and LoRA setup
Evaluating OCR performance
Evaluating multiple image inputs
Training / Fine-tuning
LLaVA Llama 3 Model Review
Multi-modal inference endpoint
VRAM Requirements for multi-modal models
IDEFICS 2 - my recommended model to build on


Taught by

Trelis Research

Related Courses

LLaVA: The New Open Access Multimodal AI Model
1littlecoder via YouTube
Autogen and Local LLMs Create Realistic Stable Diffusion Model Autonomously
kasukanra via YouTube
Image Annotation with LLaVA and Ollama
Sam Witteveen via YouTube
Unraveling Multimodality with Large Language Models
Linux Foundation via YouTube
Efficient and Portable AI/LLM Inference on the Edge Cloud - Workshop
Linux Foundation via YouTube