AudioGen- Textually Guided Audio Generation - Paper Explained
Offered By: Aleksa Gordić - The AI Epiphany via YouTube
Course Description
Overview
Dive deep into the world of text-guided audio synthesis with this comprehensive video explanation of the "AudioGen: Textually Guided Audio Generation" paper. Explore the challenges of text-to-audio conversion, compare AudioGen with VQ-GAN and SoundStream, and gain insights into audio representation, LSTM networks, and complex-valued STFTs. Learn about audio language modeling, multi-stream audio inputs, data augmentation techniques, and examine the impressive results of this innovative approach to audio generation.
Syllabus
Intro
Why is text-to-audio hard?
Comparison with VQ-GAN
Comparison with SoundStream
AudioGen overview
Deep dive: audio representation, LSTM
Losses explained
Complex-valued STFTs
Audio Language Modeling
Multi-stream audio inputs
Data and augmentations
Results
Outro
Taught by
Aleksa Gordić - The AI Epiphany
Related Courses
Reinforcement Learning for Trading StrategiesNew York Institute of Finance via Coursera Natural Language Processing with Sequence Models
DeepLearning.AI via Coursera Fake News Detection with Machine Learning
Coursera Project Network via Coursera English/French Translator: Long Short Term Memory Networks
Coursera Project Network via Coursera Text Classification Using Word2Vec and LSTM on Keras
Coursera Project Network via Coursera