YoVDO

Do Vision Transformers See Like Convolutional Neural Networks - Paper Explained

Offered By: Aleksa Gordić - The AI Epiphany via YouTube

Tags

Computer Vision Courses Artificial Intelligence Courses Data Analysis Courses Machine Learning Courses Convolutional Neural Networks (CNN) Courses Vision Transformers Courses

Course Description

Overview

Explore a detailed analysis of the paper "Do Vision Transformers See Like Convolutional Neural Networks?" in this 35-minute video. Dive into the dissection of Vision Transformers (ViTs) and ResNets, examining the differences in learned features and the factors contributing to these disparities. Investigate the contrasts between global and local receptive fields, the impact of data quantity, and the importance of skip connections in ViTs. Gain insights into how spatial information is preserved in ViTs and observe the evolution of features as the amount of training data increases. Enhance your understanding of these advanced computer vision architectures through clear explanations and visual intuitions.

Syllabus

Intro
Contrasting features in ViTs vs CNNs
Global vs Local receptive fields
Data matters, mr. obvious
Contrasting receptive fields
Data flow through CLS vs spatial tokens
Skip connections matter a lot in ViTs
Spatial information is preserved in ViTs
Features evolution with the amount of data
Outro


Taught by

Aleksa Gordić - The AI Epiphany

Related Courses

Vision Transformers Explained + Fine-Tuning in Python
James Briggs via YouTube
ConvNeXt- A ConvNet for the 2020s - Paper Explained
Aleksa Gordić - The AI Epiphany via YouTube
Stable Diffusion and Friends - High-Resolution Image Synthesis via Two-Stage Generative Models
HuggingFace via YouTube
Intro to Dense Vectors for NLP and Vision
James Briggs via YouTube
Geo-localization Framework for Real-world Scenarios - Defense Presentation
University of Central Florida via YouTube