Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

Offered By: USC Information Sciences Institute via YouTube

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Explore a groundbreaking unified model for AI tasks in this 49-minute talk presented by Jiasen Lu from AI2. Delve into Unified-IO, the first neural model capable of performing a wide range of tasks across computer vision, image synthesis, vision-and-language, and natural language processing. Learn how this model homogenizes diverse task inputs and outputs into token sequences, achieving broad unification. Discover the model's architecture, training objectives, dataset implementations, and pre-training distribution. Examine evaluation methods, including the GRIT benchmark, and analyze results across various tasks such as semantic segmentation, depth estimation, object detection, image inpainting, and segmentation-based image generation. Gain insights into the future of multi-modal AI models and their potential impact on the field.

Syllabus

Intro
Single-Task Model vs. Unified Model
Single-Task Model for Vision
Image Output Quantization
Text Input for Different Tasks
Model Details
Objective
Dataset and Implementations
Pre-training Distribution
Evaluation
GRIT requires diverse skills
Results
Semantic Segmentation
Depth Estimation
Object Detection
Image Inpainting
Segmentation based image generation
Summary
Tasks Distribution

Taught by

USC Information Sciences Institute

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue