Tiny Text and Vision Models - Fine-Tuning and API Setup
Offered By: Trelis Research via YouTube
Course Description
Overview
Explore the intricacies of fine-tuning and deploying tiny text and vision models in this 44-minute tutorial. Dive into the architecture of multi-modal models, focusing on the Moondream model's components including its vision encoder (SigLIP), MLP (visionprojection), and language model (Phi). Learn how to apply LoRA adapters to multi-modal models and follow along with a hands-on fine-tuning notebook demo. Discover techniques for deploying custom APIs for multi-modal models, utilizing vLLM, and training models from scratch. Gain insights into multi-modal datasets and access a wealth of video resources to further your understanding of advanced vision and language processing techniques.
Syllabus
Fine-tuning tiny multi-modal models
Moondream server demo
Video Overview
Multi-modal model architecture
Moondream architecture
Moondream vision encoder SigLIP
Moondream MLP visionprojection
Moondream Language Model Phi
Applying LoRA adapters to a multi-modal model
Fine-tuning notebook demo
Deploying a custom API for multi-modal models
vLLM
Training a multi-modal model from scratch
Multi-modal datasets
Video resources
Taught by
Trelis Research
Related Courses
Finetuning, Serving, and Evaluating Large Language Models in the WildOpen Data Science via YouTube Cloud Native Sustainable LLM Inference in Action
CNCF [Cloud Native Computing Foundation] via YouTube Optimizing Kubernetes Cluster Scaling for Advanced Generative Models
Linux Foundation via YouTube LLaMa for Developers
LinkedIn Learning Scaling Video Ad Classification Across Millions of Classes with GenAI
Databricks via YouTube