Lumiere: Space-Time Diffusion Model for Video Generation
Offered By: Yannic Kilcher via YouTube
Course Description
Overview
Explore a detailed explanation of Google Research's Lumiere, a groundbreaking text-to-video diffusion model designed to generate realistic and coherent motion in synthesized videos. Dive into the innovative Space-Time U-Net architecture that enables the creation of entire video durations in a single pass, overcoming limitations of existing keyframe-based approaches. Learn about the model's ability to process videos at multiple space-time scales, its state-of-the-art performance in text-to-video generation, and its versatility in various content creation tasks. Examine the technical aspects, including temporal down- and up-sampling, leveraging pre-trained text-to-image models, and applications such as image-to-video conversion, video inpainting, and stylized generation. Gain insights into the training, evaluation, and potential societal impacts of this cutting-edge technology in the field of AI-driven video synthesis.
Syllabus
- Introduction
- Problems with keyframes
- Space-Time U-Net STUNet
- Extending U-Nets to video
- Multidiffusion for SSR prediction fusing
- Stylized generation by swapping weights
- Training & Evaluation
- Societal Impact & Conclusion
Taught by
Yannic Kilcher
Related Courses
Introduction to Artificial IntelligenceStanford University via Udacity Probabilistic Graphical Models 1: Representation
Stanford University via Coursera Artificial Intelligence for Robotics
Stanford University via Udacity Computer Vision: The Fundamentals
University of California, Berkeley via Coursera Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent