Florence-2: The Best Small Vision Language Model - Capabilities and Demo
Offered By: Sam Witteveen via YouTube
Course Description
Overview
Explore the capabilities of Florence-2, a new Vision Language Model (VLM) with a dataset of 5 billion labels, in this informative video. Learn about its architecture and various functionalities, including detailed image captioning, visual grounding, dense region captioning, and open vocabulary detection. Watch demonstrations of the model's performance using Hugging Face Spaces and examine sample usage in a Colab notebook. Gain insights into how Florence-2 combines traditional computer vision tasks with modern LLM-style captioning, potentially revolutionizing the field of visual AI.
Syllabus
Intro
Florence-2 Paper
Florence - 2 Architecture
Florence - 2 Detailed Image Captioning
Florence - 2 Visual Grounding
Florence - 2 Dense Region Caption
Florence - 2 Open Vocab Detection
Hugging Face Spaces Demo
Colab Florence - 2 Large Sample Usage
Taught by
Sam Witteveen
Related Courses
Hugging Face on Azure - Partnership and Solutions AnnouncementMicrosoft via YouTube Question Answering in Azure AI - Custom and Prebuilt Solutions - Episode 49
Microsoft via YouTube Open Source Platforms for MLOps
Duke University via Coursera Masked Language Modelling - Retraining BERT with Hugging Face Trainer - Coding Tutorial
rupert ai via YouTube Masked Language Modelling with Hugging Face - Microsoft Sentence Completion - Coding Tutorial
rupert ai via YouTube