Tiny Text and Vision Models - Fine-Tuning and API Setup
Offered By: Trelis Research via YouTube
Course Description
Overview
Explore the intricacies of fine-tuning and deploying tiny text and vision models in this 44-minute tutorial. Dive into the architecture of multi-modal models, focusing on the Moondream model's components including its vision encoder (SigLIP), MLP (visionprojection), and language model (Phi). Learn how to apply LoRA adapters to multi-modal models and follow along with a hands-on fine-tuning notebook demo. Discover techniques for deploying custom APIs for multi-modal models, utilizing vLLM, and training models from scratch. Gain insights into multi-modal datasets and access a wealth of video resources to further your understanding of advanced vision and language processing techniques.
Syllabus
Fine-tuning tiny multi-modal models
Moondream server demo
Video Overview
Multi-modal model architecture
Moondream architecture
Moondream vision encoder SigLIP
Moondream MLP visionprojection
Moondream Language Model Phi
Applying LoRA adapters to a multi-modal model
Fine-tuning notebook demo
Deploying a custom API for multi-modal models
vLLM
Training a multi-modal model from scratch
Multi-modal datasets
Video resources
Taught by
Trelis Research
Related Courses
API Design and Fundamentals of Google Cloud's Apigee API PlatformGoogle Cloud via Coursera API Development on Google Cloud's Apigee API Platform
Google Cloud via Coursera On Premises Management, Security, and Upgrade with Google Cloud's Apigee API Platform
Google Cloud via Coursera Create a REST API With Node JS and Mongo DB
Udemy AWS Networking and the API Gateway
Pluralsight