YoVDO

NÜWA - Visual Synthesis Pre-training for Neural Visual World Creation

Offered By: Yannic Kilcher via YouTube

Tags

Machine Learning Courses Neural Networks Courses

Course Description

Overview

Explore a comprehensive explanation of the NÜWA research paper, which introduces a unified multimodal pre-trained model for visual synthesis tasks. Delve into the architecture's ability to process text, images, and videos using a 3D transformer encoder-decoder framework and the novel 3D Nearby Attention mechanism. Learn about the model's applications in text-to-image generation, text-guided video manipulation, and sketch-to-video tasks. Examine the shared latent space creation, latent representation transformation, and pre-training objectives. Analyze experimental results across eight different visual generation tasks and gain insights into the model's state-of-the-art performance and zero-shot capabilities.

Syllabus

- Intro & Outline
- Sponsor: ClearML
- Tasks & Naming
- The problem with recurrent image generation
- Creating a shared latent space w/ Vector Quantization
- Transforming the latent representation
- Recap: Self- and Cross-Attention
- 3D Nearby Self-Attention
- Pre-Training Objective
- Experimental Results
- Conclusion & Comments


Taught by

Yannic Kilcher

Related Courses

Neural Networks for Machine Learning
University of Toronto via Coursera
Good Brain, Bad Brain: Basics
University of Birmingham via FutureLearn
Statistical Learning with R
Stanford University via edX
Machine Learning 1—Supervised Learning
Brown University via Udacity
Fundamentals of Neuroscience, Part 2: Neurons and Networks
Harvard University via edX