YoVDO

Automatic Image Captioning with Vision Transformer and GPT-2

Offered By: Eran Feit via YouTube

Tags

Image Captioning Courses Deep Learning Courses Computer Vision Courses Neural Networks Courses PyTorch Courses GPT-2 Courses Transfer Learning Courses Hugging Face Courses Vision Transformers Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn how to generate descriptive captions for images using Python and PyTorch in this 16-minute tutorial. Explore the process of automatic image captioning with the pre-trained 'nlpconnect/vit-gpt2-image-captioning' model from Hugging Face. Set up the Vision Transformer (ViT) for image processing and GPT-2 for text generation. Discover how to install the necessary environment and Python libraries, load pre-trained models, process images with Vision Transformers, generate text with GPT-2 in PyTorch, and display the captioning results alongside the images. Access the tutorial code and find additional computer vision resources through provided links. Gain practical skills in implementing state-of-the-art image captioning techniques using popular deep learning frameworks.

Syllabus

Automatic Image Captioning with Vit-Gpt2


Taught by

Eran Feit

Related Courses

The AI Engineer Path
Scrimba
Developing Generative AI Applications with Python
IBM via edX
Models and Platforms for Generative AI
IBM via edX
Intro to Hugging Face
Codecademy
Large Language Models: Application through Production
Databricks via edX