Emu Video Generation - From MAE Pre-training to Multimodal Embeddings
Offered By: Aleksa Gordić - The AI Epiphany via YouTube
Course Description
Overview
Explore a 55-minute talk featuring Ishan Misra from Meta discussing self-supervised learning and multimodal data, with a focus on the recent Emu Video project. Dive into topics including the effectiveness of MAE pre-training for billion-scale pretraining, ImageBind's unified embedding approach, and the Emu Video generation model. Learn about qualitative comparisons and human evaluations of the generated videos, and gain insights from the Q&A session. Discover cutting-edge developments in computer vision, multimodal AI, and video generation techniques through this comprehensive discussion.
Syllabus
00:00 - Intro
00:42 - Hyperstack GPUs sponsored
02:23 - Talk intro
04:42 - The effectivenes of MAE pre-training for billion scale pretraining
12:58 - ImageBind: One Embedding to Rule them All
29:26 - Emu Video
50:39 - Qualitative Comparisons, human eval
54:30 - Q&A / outro
Taught by
Aleksa Gordić - The AI Epiphany
Related Courses
Introduction to Artificial IntelligenceStanford University via Udacity Computer Vision: The Fundamentals
University of California, Berkeley via Coursera Computational Photography
Georgia Institute of Technology via Coursera Einführung in Computer Vision
Technische Universität München (Technical University of Munich) via Coursera Introduction to Computer Vision
Georgia Institute of Technology via Udacity