YoVDO

Vision Transformer and Its Applications

Offered By: Open Data Science via YouTube

Tags

Computer Vision Courses Image Processing Courses Self-Attention Courses Vision Transformers Courses

Course Description

Overview

Explore a 35-minute talk on Vision Transformer and its applications in computer vision. Delve into the breakthrough model architecture, focusing on self-attention and its role in vision. Examine various implementations utilizing Vision Transformer as the main backbone, including applications in recognition, detection, segmentation, multi-modal learning, and scene text recognition. Discover the potential of self-attention beyond transformers in building general-purpose model architectures capable of processing diverse data formats such as text, audio, image, and video. Learn about training techniques, including pre-training on large datasets and knowledge distillation. Investigate the model's performance in semantic segmentation, medical image segmentation, and its parameter, FLOPS, and speed efficiency. Understand the limitations of Vision Transformers and gain insights into recommended open-source implementations.

Syllabus

Intro
Vision Transformer (Vit) and its Applications
Why it matters?
Human Visual Attention
Attention is Dot Product between 2 Features
In Natural Language Processing
Image to Patches
Linear Projection - Patches to Features
Vision Transformer is Invariant to Position de Patches
Position Embedding
Learnable Class Embedding
Why Layer Norm?
Why Skip Connection?
Why Multi-Head Self-Attention?
A Transformer Encoder is Made of L Encode Modules Stacked Together
Version based on Layers, MLP size, MSA heaus
Pre-training on a large dataset, fine-tune or the target dataset
Training by Knowledge Distillation (Deit)
Sematic Segmentation (mlou: 50.3 SETR vs baseline PSPNet on ADE20k)
Semantic Segmentation (mlou: 84.4 Segformer vs 82.2 SETR on Cityscapes)
Vision Transformer for STR (VITSTR)
Parameter, FLOPS, Speed Efficient
Medical Image Segmentation (DSC: 77.5 TransUnet vs 71.3 R50-Vit baseline)
Limitations
Recommended Open-Source Implementations of Vit


Taught by

Open Data Science

Related Courses

Introduction to Artificial Intelligence
Stanford University via Udacity
Computer Vision: The Fundamentals
University of California, Berkeley via Coursera
Computational Photography
Georgia Institute of Technology via Coursera
Einführung in Computer Vision
Technische Universität München (Technical University of Munich) via Coursera
Introduction to Computer Vision
Georgia Institute of Technology via Udacity