YoVDO

Vision Transformer and Its Applications

Offered By: Open Data Science via YouTube

Tags

Computer Vision Courses Image Processing Courses Self-Attention Courses Vision Transformers Courses

Course Description

Overview

Explore a 35-minute talk on Vision Transformer and its applications in computer vision. Delve into the breakthrough model architecture, focusing on self-attention and its role in vision. Examine various implementations utilizing Vision Transformer as the main backbone, including applications in recognition, detection, segmentation, multi-modal learning, and scene text recognition. Discover the potential of self-attention beyond transformers in building general-purpose model architectures capable of processing diverse data formats such as text, audio, image, and video. Learn about training techniques, including pre-training on large datasets and knowledge distillation. Investigate the model's performance in semantic segmentation, medical image segmentation, and its parameter, FLOPS, and speed efficiency. Understand the limitations of Vision Transformers and gain insights into recommended open-source implementations.

Syllabus

Intro
Vision Transformer (Vit) and its Applications
Why it matters?
Human Visual Attention
Attention is Dot Product between 2 Features
In Natural Language Processing
Image to Patches
Linear Projection - Patches to Features
Vision Transformer is Invariant to Position de Patches
Position Embedding
Learnable Class Embedding
Why Layer Norm?
Why Skip Connection?
Why Multi-Head Self-Attention?
A Transformer Encoder is Made of L Encode Modules Stacked Together
Version based on Layers, MLP size, MSA heaus
Pre-training on a large dataset, fine-tune or the target dataset
Training by Knowledge Distillation (Deit)
Sematic Segmentation (mlou: 50.3 SETR vs baseline PSPNet on ADE20k)
Semantic Segmentation (mlou: 84.4 Segformer vs 82.2 SETR on Cityscapes)
Vision Transformer for STR (VITSTR)
Parameter, FLOPS, Speed Efficient
Medical Image Segmentation (DSC: 77.5 TransUnet vs 71.3 R50-Vit baseline)
Limitations
Recommended Open-Source Implementations of Vit


Taught by

Open Data Science

Related Courses

Transformers: Text Classification for NLP Using BERT
LinkedIn Learning
TensorFlow: Working with NLP
LinkedIn Learning
TransGAN - Two Transformers Can Make One Strong GAN - Machine Learning Research Paper Explained
Yannic Kilcher via YouTube
Nyströmformer- A Nyström-Based Algorithm for Approximating Self-Attention
Yannic Kilcher via YouTube
Recreate Google Translate - Model Training
Edan Meyer via YouTube