Fine-tuning Multi-modal Video and Text Models
Offered By: Trelis Research via YouTube
Course Description
Overview
Learn how to fine-tune multi-modal video and text models in this comprehensive tutorial. Explore techniques for clipping and querying videos using an IDEFICS 2 endpoint, generate datasets for video fine-tuning, and push them to a hub. Discover methods for image splitting in Jupyter Notebooks and understand the IDEFICS 2 vision-to-text adapter architecture. Follow along as the instructor demonstrates loading video datasets for fine-tuning and provides a recap of the entire process. Gain valuable insights into transforming image and text models into powerful video and text analysis tools.
Syllabus
"Video + Text" from "Image + Text" models
Clipping and Querying Videos with an IDEFICS 2 endpoint
Fine-tuning video + text models
Dataset generation for video fine-tuning + pushing to hub
Clipping and querying videos with image splitting in a Jupyter Notebook
Side-note - IDEFICS 2 vision to text adapter architecture
Video clip notebook evaluation - continued
Loading a video dataset for fine-tuning
Recap of video + text model fine-tuning
Taught by
Trelis Research
Related Courses
Introduction to Artificial IntelligenceStanford University via Udacity Computer Vision: The Fundamentals
University of California, Berkeley via Coursera Computational Photography
Georgia Institute of Technology via Coursera Einführung in Computer Vision
Technische Universität München (Technical University of Munich) via Coursera Introduction to Computer Vision
Georgia Institute of Technology via Udacity