YoVDO

AudioGen- Textually Guided Audio Generation - Paper Explained

Offered By: Aleksa Gordić - The AI Epiphany via YouTube

Tags

Generative Adversarial Networks (GAN) Courses Long short-term memory (LSTM) Courses Data Augmentation Courses Audio generation Courses

Course Description

Overview

Dive deep into the world of text-guided audio synthesis with this comprehensive video explanation of the "AudioGen: Textually Guided Audio Generation" paper. Explore the challenges of text-to-audio conversion, compare AudioGen with VQ-GAN and SoundStream, and gain insights into audio representation, LSTM networks, and complex-valued STFTs. Learn about audio language modeling, multi-stream audio inputs, data augmentation techniques, and examine the impressive results of this innovative approach to audio generation.

Syllabus

Intro
Why is text-to-audio hard?
Comparison with VQ-GAN
Comparison with SoundStream
AudioGen overview
Deep dive: audio representation, LSTM
Losses explained
Complex-valued STFTs
Audio Language Modeling
Multi-stream audio inputs
Data and augmentations
Results
Outro


Taught by

Aleksa Gordić - The AI Epiphany

Related Courses

Advanced Deep Learning Techniques for Computer Vision
MathWorks via Coursera
Applied Local Large Language Models
Pragmatic AI Labs via FutureLearn
Apply Generative Adversarial Networks (GANs)
DeepLearning.AI via Coursera
AWS ML Engineer Associate 1.3 Validate Data and Prepare for Modeling (Korean)
Amazon Web Services via AWS Skill Builder
AWS ML Engineer Associate 1.3 Validate Data and Prepare for Modeling (Simplified Chinese)
Amazon Web Services via AWS Skill Builder