YoVDO

An Image is Worth 16x16 Words - Transformers for Image Recognition at Scale

Offered By: Yannic Kilcher via YouTube

Tags

Image Recognition Courses Artificial Intelligence Courses Computer Vision Courses Image Processing Courses Transformers Courses Inductive Bias Courses

Course Description

Overview

Explore a comprehensive analysis of the groundbreaking paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" in this 30-minute video. Delve into the revolutionary Vision Transformer (ViT) architecture that outperforms Convolutional Neural Networks in image recognition tasks. Examine the reasons behind its superior performance, and critically evaluate the double-blind peer review process. Learn about the application of Transformers to image processing, understand the ViT architecture in detail, and review experimental results. Investigate what the model learns, discuss why Transformers are disrupting traditional approaches, and explore inductive biases in Transformers. Gain valuable insights into the future of computer vision and natural language processing through this in-depth explanation of cutting-edge AI research.

Syllabus

- Introduction
- Double-Blind Review is Broken
- Overview
- Transformers for Images
- Vision Transformer Architecture
- Experimental Results
- What does the Model Learn?
- Why Transformers are Ruining Everything
- Inductive Biases in Transformers
- Conclusion & Comments


Taught by

Yannic Kilcher

Related Courses

FA17: Machine Learning
Georgia Institute of Technology via edX
Machine Learning
Georgia Institute of Technology via edX
Noether Networks - Meta-Learning Useful Conserved Quantities
Yannic Kilcher via YouTube
Discovering Symbolic Models from Deep Learning with Inductive Biases
Yannic Kilcher via YouTube
MIT EI Seminar - Phillip Isola - Emergent Intelligence- Getting More Out of Agents Than You Bake In
Massachusetts Institute of Technology via YouTube