Automatic Image Captioning with Vision Transformer and GPT-2
Offered By: Eran Feit via YouTube
Course Description
Overview
Learn how to generate descriptive captions for images using Python and PyTorch in this 16-minute tutorial. Explore the process of automatic image captioning with the pre-trained 'nlpconnect/vit-gpt2-image-captioning' model from Hugging Face. Set up the Vision Transformer (ViT) for image processing and GPT-2 for text generation. Discover how to install the necessary environment and Python libraries, load pre-trained models, process images with Vision Transformers, generate text with GPT-2 in PyTorch, and display the captioning results alongside the images. Access the tutorial code and find additional computer vision resources through provided links. Gain practical skills in implementing state-of-the-art image captioning techniques using popular deep learning frameworks.
Syllabus
Automatic Image Captioning with Vit-Gpt2
Taught by
Eran Feit
Related Courses
Introduction to Artificial IntelligenceStanford University via Udacity Computer Vision: The Fundamentals
University of California, Berkeley via Coursera Computational Photography
Georgia Institute of Technology via Coursera Einführung in Computer Vision
Technische Universität München (Technical University of Munich) via Coursera Introduction to Computer Vision
Georgia Institute of Technology via Udacity