YoVDO

Mastering Google's PaliGemma VLM: Tips and Tricks for Success and Fine-Tuning

Offered By: Sam Witteveen via YouTube

Tags

Machine Learning Courses Computer Vision Courses Transformers Courses Fine-Tuning Courses Hugging Face Courses Vision-Language Models Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore Google's Vision Language Model PaliGemma in this informative video tutorial. Learn about the model's architecture, capabilities, and applications through a comprehensive overview of PaLI-3 and SigLIP papers. Discover the three pre-trained checkpoints, various sizes, and releases of PaliGemma. Gain hands-on experience with a Hugging Face Spaces demo and explore ScreenAI datasets. Dive into practical coding sessions, focusing on using PaliGemma with Transformers and fine-tuning techniques. Access provided resources, including Colab notebooks for inference and fine-tuning, to enhance your understanding and implementation of this powerful vision language model.

Syllabus

Intro
What is PaliGemma?
PaLI-3 Paper
SigLIP Paper
Hugging Face Blog: PaliGemma
PaliGemma: Three Pre-trained Checkpoints
PaliGemma different Sizes and Releases
PaliGemma Hugging Face Spaces Demo
ScreenAI Datasets
Code Time
Using PaliGemma with Transformers
PaliGemma Finetuning


Taught by

Sam Witteveen

Related Courses

Fine-tuning PaliGemma for Custom Object Detection
Roboflow via YouTube
Florence-2: The Best Small Vision Language Model - Capabilities and Demo
Sam Witteveen via YouTube
Fine-tuning Florence-2: Microsoft's Multimodal Model for Custom Object Detection
Roboflow via YouTube
OpenVLA: An Open-Source Vision-Language-Action Model - Research Presentation
HuggingFace via YouTube
New Flux IMG2IMG Trick, Upscaling Options, and Prompt Ideas in ComfyUI
Nerdy Rodent via YouTube