Unsupervised Learning of Spoken Language with Visual Context
Offered By: MITCBMM via YouTube
Course Description
Overview
Explore the cutting-edge research on unsupervised learning of spoken language using visual context in this 34-minute talk by Jim Glass from MIT. Delve into the challenges of automatic speech recognition and the potential of audio-visual embedding spaces to revolutionize language learning. Discover how deep learning models can associate images with spoken descriptions, creating word-like units from unannotated speech. Examine the experimental evaluation and analysis demonstrating the model's ability to cluster visual objects and their spoken counterparts. Learn about crowdsourcing audio-visual data, evaluation techniques for image search and annotation, and time-varying audio-visual affiliation. Gain insights into audio-visual grounding for localization, spatial distribution of speech clusters, and the broader implications for advancing speech recognition capabilities across the world's languages.
Syllabus
Intro
Challenge for Automatic Speech Recognition
A Perspective on Spoken Language Processing Most (-9%) of the worlds languages have not been addressed by resource and expert intensive supervised
Crossing the Vision Language Boundary
Learning an Audio/Visual Embedding Space?
Joint Audio-Visual Analysis Architecture
Crowdsourcing Audio-Visual Data
Evaluation: Image and Search Annotation
Evaluating via Image Search
Evaluating via Image Annotation
Time-varying Audio-Visual Affiliation
Audio-Visual Grounding for Localization
Examples of Audio-Visual Clusters
Cluster Analysis
Spatial Distribution of Speech Clusters
Final Message
Taught by
MITCBMM
Related Courses
Machine Learning: Unsupervised LearningBrown University via Udacity Practical Predictive Analytics: Models and Methods
University of Washington via Coursera Поиск структуры в данных
Moscow Institute of Physics and Technology via Coursera Statistical Machine Learning
Carnegie Mellon University via Independent FA17: Machine Learning
Georgia Institute of Technology via edX