Automatic Image Captioning with Vision Transformer and GPT-2
Offered By: Eran Feit via YouTube
Course Description
Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn how to generate descriptive captions for images using Python and PyTorch in this 16-minute tutorial. Explore the process of automatic image captioning with the pre-trained 'nlpconnect/vit-gpt2-image-captioning' model from Hugging Face. Set up the Vision Transformer (ViT) for image processing and GPT-2 for text generation. Discover how to install the necessary environment and Python libraries, load pre-trained models, process images with Vision Transformers, generate text with GPT-2 in PyTorch, and display the captioning results alongside the images. Access the tutorial code and find additional computer vision resources through provided links. Gain practical skills in implementing state-of-the-art image captioning techniques using popular deep learning frameworks.
Syllabus
Automatic Image Captioning with Vit-Gpt2
Taught by
Eran Feit
Related Courses
Developing Generative AI Applications with PythonIBM via edX Create Image Captioning Models - Deutsch
Google Cloud via Coursera Introduction to RNN and DNN
Packt via Coursera Create Image Captioning Models - 한국어
Google Cloud via Coursera Create Image Captioning Models - Português Brasileiro
Google Cloud via Coursera