YoVDO

Florence-2: The Best Small Vision Language Model - Capabilities and Demo

Offered By: Sam Witteveen via YouTube

Tags

Computer Vision Courses Artificial Intelligence Courses Image Captioning Courses Hugging Face Courses Vision-Language Models Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the capabilities of Florence-2, a new Vision Language Model (VLM) with a dataset of 5 billion labels, in this informative video. Learn about its architecture and various functionalities, including detailed image captioning, visual grounding, dense region captioning, and open vocabulary detection. Watch demonstrations of the model's performance using Hugging Face Spaces and examine sample usage in a Colab notebook. Gain insights into how Florence-2 combines traditional computer vision tasks with modern LLM-style captioning, potentially revolutionizing the field of visual AI.

Syllabus

Intro
Florence-2 Paper
Florence - 2 Architecture
Florence - 2 Detailed Image Captioning
Florence - 2 Visual Grounding
Florence - 2 Dense Region Caption
Florence - 2 Open Vocab Detection
Hugging Face Spaces Demo
Colab Florence - 2 Large Sample Usage


Taught by

Sam Witteveen

Related Courses

Deep Learning For Visual Computing
Indian Institute of Technology, Kharagpur via Swayam
Literacy Essentials: Core Concepts Generative Adversarial Network
Pluralsight
Machine Learning & Deep Learning Projects
The AI University via YouTube
Implement Image Captioning with Recurrent Neural Networks
Pluralsight
VirTex- Learning Visual Representations from Textual Annotations
Yannic Kilcher via YouTube