YoVDO

Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models

Offered By: Center for Language & Speech Processing(CLSP), JHU via YouTube

Tags

Computer Vision Courses Object Detection Courses Semantics Courses Localization Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the comparative analysis of visual representations in vision-and-language models versus vision-only models in this 10-minute conference talk from EACL 2024. Delve into the research conducted by Zhuowan Li from the Center for Language & Speech Processing at JHU, which probes a wide range of tasks to assess the quality of learned representations. Discover intriguing findings that suggest vision-and-language models excel in label prediction tasks like object and attribute prediction, while vision-only models demonstrate superior performance in dense prediction tasks requiring more localized information. Gain insights into the role of language in visual learning and obtain an empirical guide for various pretrained models, contributing to the ongoing discussion about the effectiveness of joint learning paradigms in understanding individual modalities.

Syllabus

Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models - EACL 2024


Taught by

Center for Language & Speech Processing(CLSP), JHU

Related Courses

Developing International Software: Part 2
Microsoft via edX
Introduction to Commutative Algebra
Indian Institute of Technology Madras via Swayam
Localization Essentials
Google via Udacity
Self-Driving Fundamentals: Featuring Apollo
Baidu via Udacity
Advanced IOT Applications
Indian Institute of Science Bangalore via Swayam