YoVDO

Automatic Image Captioning with Vision Transformer and GPT-2

Offered By: Eran Feit via YouTube

Tags

Image Captioning Courses Deep Learning Courses Computer Vision Courses Neural Networks Courses PyTorch Courses GPT-2 Courses Transfer Learning Courses Hugging Face Courses Vision Transformers Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn how to generate descriptive captions for images using Python and PyTorch in this 16-minute tutorial. Explore the process of automatic image captioning with the pre-trained 'nlpconnect/vit-gpt2-image-captioning' model from Hugging Face. Set up the Vision Transformer (ViT) for image processing and GPT-2 for text generation. Discover how to install the necessary environment and Python libraries, load pre-trained models, process images with Vision Transformers, generate text with GPT-2 in PyTorch, and display the captioning results alongside the images. Access the tutorial code and find additional computer vision resources through provided links. Gain practical skills in implementing state-of-the-art image captioning techniques using popular deep learning frameworks.

Syllabus

Automatic Image Captioning with Vit-Gpt2


Taught by

Eran Feit

Related Courses

Neural Networks for Machine Learning
University of Toronto via Coursera
機器學習技法 (Machine Learning Techniques)
National Taiwan University via Coursera
Machine Learning Capstone: An Intelligent Application with Deep Learning
University of Washington via Coursera
Прикладные задачи анализа данных
Moscow Institute of Physics and Technology via Coursera
Leading Ambitious Teaching and Learning
Microsoft via edX