YoVDO

VirTex- Learning Visual Representations from Textual Annotations

Offered By: Yannic Kilcher via YouTube

Tags

Image Captioning Courses Computer Vision Courses Fine-Tuning Courses

Course Description

Overview

Explore a detailed explanation of the VirTex paper, which introduces a novel approach to visual transfer learning using textual annotations. Dive into the methodology of pre-training convolutional neural networks from scratch using high-quality image captions, and discover how this technique compares to traditional supervised and unsupervised pre-training methods. Learn about the quality-quantity tradeoff in visual representation learning, the image captioning task, and the VirTex method's implementation. Examine the results of linear classification, ablation studies, fine-tuning experiments, and attention visualization. Gain insights into how this approach achieves comparable or superior performance to ImageNet-based pre-training while using significantly fewer images, potentially revolutionizing visual transfer learning for various computer vision tasks.

Syllabus

- Intro & Overview
- Pre-Training for Visual Tasks
- Quality-Quantity Tradeoff
- Image Captioning
- VirTex Method
- Linear Classification
- Ablations
- Fine-Tuning
- Attention Visualization
- Conclusion & Remarks


Taught by

Yannic Kilcher

Related Courses

Amazon SageMaker JumpStart Foundations (Japanese)
Amazon Web Services via AWS Skill Builder
AWS Flash - Generative AI with Diffusion Models
Amazon Web Services via AWS Skill Builder
AWS Flash - Operationalize Generative AI Applications (FMOps/LLMOps)
Amazon Web Services via AWS Skill Builder
AWS SimuLearn: Automate Fine-Tuning of an LLM
Amazon Web Services via AWS Skill Builder
AWS SimuLearn: Fine-Tune a Base Model with RLHF
Amazon Web Services via AWS Skill Builder