Generative AI for Text to Audio Generation
Offered By: SAIConference via YouTube
Course Description
Overview
Explore cutting-edge advancements in generative AI for text-to-audio generation in this keynote presentation by Professor Wenwu Wang from the University of Surrey. Delve into the evolution of AI technology capable of producing soundscapes from simple text prompts, revolutionizing industries such as filmmaking, game design, virtual reality, and digital media. Learn about the progression from traditional methods to deep learning-based models like AudioLDM, AudioLDM2, Re-AudioLDM, and Wavjourney, and understand how these models map and align text with audio events to create complex audio environments. Discover real-world applications ranging from sound synthesis in gaming and movies to assisting the visually impaired. Gain insights into recent breakthroughs in cross-modal generation, key challenges, and future research directions. Experience live demonstrations and learn how to experiment with these tools on platforms like GitHub and Hugging Face. Key topics covered include an overview of deep generative AI for text-to-audio generation, introduction to key models, practical applications in sound design, and hands-on experimentation with open-source tools.
Syllabus
Generative AI for Text to Audio Generation | Wenwu Wang, University of Surrey | IntelliSys 2024
Taught by
SAIConference
Related Courses
Survey of Music TechnologyGeorgia Institute of Technology via Coursera Introduction to Programming for Musicians and Digital Artists
California Institute of the Arts via Coursera Sound Synthesis Using Reaktor
California Institute of the Arts via Kadenze Introduction to Sound and Acoustic Sketching
University St. Joseph via Kadenze Digital Storytelling: Filmmaking for the Web
University of Birmingham via FutureLearn