Mastering Google's PaliGemma VLM: Tips and Tricks for Success and Fine-Tuning
Offered By: Sam Witteveen via YouTube
Course Description
Overview
Explore Google's Vision Language Model PaliGemma in this informative video tutorial. Learn about the model's architecture, capabilities, and applications through a comprehensive overview of PaLI-3 and SigLIP papers. Discover the three pre-trained checkpoints, various sizes, and releases of PaliGemma. Gain hands-on experience with a Hugging Face Spaces demo and explore ScreenAI datasets. Dive into practical coding sessions, focusing on using PaliGemma with Transformers and fine-tuning techniques. Access provided resources, including Colab notebooks for inference and fine-tuning, to enhance your understanding and implementation of this powerful vision language model.
Syllabus
Intro
What is PaliGemma?
PaLI-3 Paper
SigLIP Paper
Hugging Face Blog: PaliGemma
PaliGemma: Three Pre-trained Checkpoints
PaliGemma different Sizes and Releases
PaliGemma Hugging Face Spaces Demo
ScreenAI Datasets
Code Time
Using PaliGemma with Transformers
PaliGemma Finetuning
Taught by
Sam Witteveen
Related Courses
Fine-tuning PaliGemma for Custom Object DetectionRoboflow via YouTube Florence-2: The Best Small Vision Language Model - Capabilities and Demo
Sam Witteveen via YouTube Fine-tuning Florence-2: Microsoft's Multimodal Model for Custom Object Detection
Roboflow via YouTube OpenVLA: An Open-Source Vision-Language-Action Model - Research Presentation
HuggingFace via YouTube New Flux IMG2IMG Trick, Upscaling Options, and Prompt Ideas in ComfyUI
Nerdy Rodent via YouTube