YoVDO

Vision Transformers Explained + Fine-Tuning in Python

Offered By: James Briggs via YouTube

Tags

Computer Vision Courses Artificial Intelligence Courses Machine Learning Courses Deep Learning Courses Python Courses Image Classification Courses Transformers Courses Attention Mechanisms Courses Vision Transformers Courses

Course Description

Overview

Explore the groundbreaking Vision Transformer (ViT) model in this comprehensive tutorial video. Dive into the intuition behind ViT's functionality, understanding how it bridges the gap between vision and language processing in machine learning. Learn about attention mechanisms, image patch embeddings, and the key components that make ViT effective. Follow along with a hands-on Python implementation using the Hugging Face transformers library for image classification tasks. Gain insights into setting up the environment, initializing the ViT Feature Extractor, configuring the Hugging Face Trainer, and evaluating model performance. Perfect for those interested in cutting-edge developments in computer vision and natural language processing.

Syllabus

Intro
In this video
What are transformers and attention?
Attention explained simply
Attention used in CNNs
Transformers and attention
What vision transformer ViT does differently
Images to patch embeddings
1. Building image patches
2. Linear projection
3. Learnable class embedding
4. Adding positional embeddings
ViT implementation in python with Hugging Face
Packages, dataset, and Colab GPU
Initialize Hugging Face ViT Feature Extractor
Hugging Face Trainer setup
Training and CUDA device error
Evaluation and classification predictions with ViT
Final thoughts


Taught by

James Briggs

Related Courses

ConvNeXt- A ConvNet for the 2020s - Paper Explained
Aleksa Gordić - The AI Epiphany via YouTube
Do Vision Transformers See Like Convolutional Neural Networks - Paper Explained
Aleksa Gordić - The AI Epiphany via YouTube
Stable Diffusion and Friends - High-Resolution Image Synthesis via Two-Stage Generative Models
HuggingFace via YouTube
Intro to Dense Vectors for NLP and Vision
James Briggs via YouTube
Geo-localization Framework for Real-world Scenarios - Defense Presentation
University of Central Florida via YouTube