YoVDO

Fine-tuning Multi-modal Video and Text Models

Offered By: Trelis Research via YouTube

Tags

Fine-Tuning Courses Computer Vision Courses Jupyter Notebooks Courses Model Evaluation Courses Hugging Face Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn how to fine-tune multi-modal video and text models in this comprehensive tutorial. Explore techniques for clipping and querying videos using an IDEFICS 2 endpoint, generate datasets for video fine-tuning, and push them to a hub. Discover methods for image splitting in Jupyter Notebooks and understand the IDEFICS 2 vision-to-text adapter architecture. Follow along as the instructor demonstrates loading video datasets for fine-tuning and provides a recap of the entire process. Gain valuable insights into transforming image and text models into powerful video and text analysis tools.

Syllabus

"Video + Text" from "Image + Text" models
Clipping and Querying Videos with an IDEFICS 2 endpoint
Fine-tuning video + text models
Dataset generation for video fine-tuning + pushing to hub
Clipping and querying videos with image splitting in a Jupyter Notebook
Side-note - IDEFICS 2 vision to text adapter architecture
Video clip notebook evaluation - continued
Loading a video dataset for fine-tuning
Recap of video + text model fine-tuning


Taught by

Trelis Research

Related Courses

TensorFlow: Working with NLP
LinkedIn Learning
Introduction to Video Editing - Video Editing Tutorials
Great Learning via YouTube
HuggingFace Crash Course - Sentiment Analysis, Model Hub, Fine Tuning
Python Engineer via YouTube
GPT3 and Finetuning the Core Objective Functions - A Deep Dive
David Shapiro ~ AI via YouTube
How to Build a Q&A AI in Python - Open-Domain Question-Answering
James Briggs via YouTube