YoVDO

Visual Question Answering: Grounded Systems and Transformer Capsules

Offered By: University of Central Florida via YouTube

Tags

Computer Vision Courses Artificial Intelligence Courses Machine Learning Courses Deep Learning Courses Capsule Networks Courses Image Processing Courses Transformers Courses

Course Description

Overview

Explore the concept of Grounded Visual Question Answering (VQA) in this 22-minute lecture from the University of Central Florida. Delve into the limitations of existing VQA systems and discover how grounded VQA systems aim to overcome these challenges. Learn about the problem setup, including the use of transformers with capsules, capsule-based tokens, and text-based residual connections. Examine pre-training tasks such as Masked Language Modeling (MLM) and Image Text Matching, along with the datasets used for pre-training. Investigate the fine-tuning process for downstream tasks and analyze qualitative comparisons using the GQA dataset. Review evaluation metrics and results before concluding with insights into future work in this rapidly evolving field of artificial intelligence and computer vision.

Syllabus

Intro
Grounded Visual Question Answering
Limitations of Existing VQA Systems
Grounded VQA Systems
Problem Setup
Transformers with Capsules
Approach
Capsule-based Tokens
Input to Intermediate Transformer layers
Text-based Residual Connection
Pre-training Tasks
Masked Language Modeling (MLM)
Image Text Matching
Pre-training Datasets
Fine-tuning on Downstream Task
Qualitative comparison - GQA
Evaluation Metrics
Results - GQA
Conclusion and Future Work


Taught by

UCF CRCV

Tags

Related Courses

Introduction to Artificial Intelligence
Stanford University via Udacity
Computer Vision: The Fundamentals
University of California, Berkeley via Coursera
Computational Photography
Georgia Institute of Technology via Coursera
Einführung in Computer Vision
Technische Universität München (Technical University of Munich) via Coursera
Introduction to Computer Vision
Georgia Institute of Technology via Udacity