NÜWA - Visual Synthesis Pre-training for Neural Visual World Creation
Offered By: Yannic Kilcher via YouTube
Course Description
Overview
Explore a comprehensive explanation of the NÜWA research paper, which introduces a unified multimodal pre-trained model for visual synthesis tasks. Delve into the architecture's ability to process text, images, and videos using a 3D transformer encoder-decoder framework and the novel 3D Nearby Attention mechanism. Learn about the model's applications in text-to-image generation, text-guided video manipulation, and sketch-to-video tasks. Examine the shared latent space creation, latent representation transformation, and pre-training objectives. Analyze experimental results across eight different visual generation tasks and gain insights into the model's state-of-the-art performance and zero-shot capabilities.
Syllabus
- Intro & Outline
- Sponsor: ClearML
- Tasks & Naming
- The problem with recurrent image generation
- Creating a shared latent space w/ Vector Quantization
- Transforming the latent representation
- Recap: Self- and Cross-Attention
- 3D Nearby Self-Attention
- Pre-Training Objective
- Experimental Results
- Conclusion & Comments
Taught by
Yannic Kilcher
Related Courses
Neural Networks for Machine LearningUniversity of Toronto via Coursera Good Brain, Bad Brain: Basics
University of Birmingham via FutureLearn Statistical Learning with R
Stanford University via edX Machine Learning 1—Supervised Learning
Brown University via Udacity Fundamentals of Neuroscience, Part 2: Neurons and Networks
Harvard University via edX