YoVDO

Meet FLAVA: A Unified Vision and Language Foundation Model

Offered By: Snorkel AI via YouTube

Tags

Multimodal AI Courses Computer Vision Courses Transformers Courses Foundation Models Courses Hugging Face Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the development and capabilities of FLAVA, a unified vision and language model, in this 21-minute conference talk presented by Amanpreet Singh, Research Lead at Hugging Face. Dive into the journey towards creating a holistic universal model that excels in vision tasks, language tasks, and cross- and multi-modal vision and language tasks. Learn about the impressive performance of FLAVA on 35 diverse tasks spanning multiple modalities. Discover the evolution from domain-specific transformer models to the UniT (Unified Transformer) approach, and understand how FLAVA takes this concept even further. Gain insights into the model's architecture, functionality, and evaluation process. This presentation, recorded at Snorkel AI's 2023 Foundation Model Virtual Summit, offers valuable knowledge for those interested in state-of-the-art visio-linguistic pretraining and foundation models in artificial intelligence.

Syllabus

Intro
How do we build foundation
Successes of transformers in (specific) domains
UniT: Unified Transformer across domains
Can we take it one step further?
How does FLAVA work?
Stepping up the evaluation


Taught by

Snorkel AI

Related Courses

Hugging Face on Azure - Partnership and Solutions Announcement
Microsoft via YouTube
Question Answering in Azure AI - Custom and Prebuilt Solutions - Episode 49
Microsoft via YouTube
Open Source Platforms for MLOps
Duke University via Coursera
Masked Language Modelling - Retraining BERT with Hugging Face Trainer - Coding Tutorial
rupert ai via YouTube
Masked Language Modelling with Hugging Face - Microsoft Sentence Completion - Coding Tutorial
rupert ai via YouTube