YoVDO

AudioGen- Textually Guided Audio Generation - Paper Explained

Offered By: Aleksa Gordić - The AI Epiphany via YouTube

Tags

Generative Adversarial Networks (GAN) Courses Long short-term memory (LSTM) Courses Data Augmentation Courses Audio generation Courses

Course Description

Overview

Dive deep into the world of text-guided audio synthesis with this comprehensive video explanation of the "AudioGen: Textually Guided Audio Generation" paper. Explore the challenges of text-to-audio conversion, compare AudioGen with VQ-GAN and SoundStream, and gain insights into audio representation, LSTM networks, and complex-valued STFTs. Learn about audio language modeling, multi-stream audio inputs, data augmentation techniques, and examine the impressive results of this innovative approach to audio generation.

Syllabus

Intro
Why is text-to-audio hard?
Comparison with VQ-GAN
Comparison with SoundStream
AudioGen overview
Deep dive: audio representation, LSTM
Losses explained
Complex-valued STFTs
Audio Language Modeling
Multi-stream audio inputs
Data and augmentations
Results
Outro


Taught by

Aleksa Gordić - The AI Epiphany

Related Courses

MusicLM Generates Music From Text - Paper Breakdown
Valerio Velardo - The Sound of AI via YouTube
A Composer's Guide to Creating with Generative Neural Networks
GOTO Conferences via YouTube
21 Recent AI Updates in 23 Minutes
1littlecoder via YouTube
Popcorn & Clocks - A Story About Scheduling in the Browser
NDC Conferences via YouTube
Monotron - A 1980s Style Home Computer Written in Rust
ACCU Conference via YouTube