BLIP- Bootstrapping Language-Image Pre-Training for Unified Vision-Language Understanding and Generation
Offered By: Yannic Kilcher via YouTube
Course Description
Overview
Explore a comprehensive review of BLIP (Bootstrapping Language-Image Pre-training), a groundbreaking framework for unified vision-language understanding and generation. Delve into the intricacies of cross-modal pre-training, examining how BLIP addresses issues like low-quality datasets and limited fine-tuning capabilities. Learn about the model's architecture, data flow, and parameter sharing between modules. Discover the innovative captioning and filtering bootstrapping process, and understand how BLIP achieves state-of-the-art results in various vision-language tasks. Gain insights into its application to video-language tasks and its potential impact on the field of artificial intelligence.
Syllabus
- Intro
- Sponsor: Zeta Alpha
- Paper Overview
- Vision-Language Pre-Training
- Contributions of the paper
- Model architecture: many parts for many tasks
- How data flows in the model
- Parameter sharing between the modules
- Captioning & Filtering bootstrapping
- Fine-tuning the model for downstream tasks
Taught by
Yannic Kilcher
Related Courses
Neural Networks for Machine LearningUniversity of Toronto via Coursera 機器學習技法 (Machine Learning Techniques)
National Taiwan University via Coursera Machine Learning Capstone: An Intelligent Application with Deep Learning
University of Washington via Coursera Прикладные задачи анализа данных
Moscow Institute of Physics and Technology via Coursera Leading Ambitious Teaching and Learning
Microsoft via edX