YoVDO

Visual Features for Context-Aware Speech Recognition - 2016

Offered By: Center for Language & Speech Processing(CLSP), JHU via YouTube

Tags

Speech Recognition Courses Machine Learning Courses Computer Vision Courses Deep Neural Networks Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore cutting-edge techniques for improving automatic speech recognition in challenging multimedia content through this comprehensive lecture by Florian Metze from Carnegie Mellon University. Delve into methods for adapting acoustic and language models using visual context from video, such as detected objects and scenes. Learn about experiments on "how-to" videos that demonstrate reduced word error rates by incorporating visual information. Examine approaches for handling speech variability, speaker-microphone distance, and audio-visual fusion. Gain insights into applications for robotics, human-computer interaction, and large-scale multimedia indexing. Discover how this research aims to bridge the gap between video-to-text and speech-to-text communities.

Syllabus

Intro
Outline
Automatic Speech Recognition
Speech Variability (Spectral)
Decoding Procedure
Experimental Setup
Simple Extensions
Performance on Switchboard
IARPA "Aladdin" Project
Speaker Microphone Distance (SMD)
Training SMD Extractors
Training SMD descriptors
SMD Results
SMD Analysis
Audio-Visual ASR
Speaker Attributes
Speaker Actions
Semantic Indexing CNN Features
Fusion of Approaches
Analysis "indoor" vs "outdoor"
Summary


Taught by

Center for Language & Speech Processing(CLSP), JHU

Related Courses

Introduction to Artificial Intelligence
Stanford University via Udacity
Computer Vision: The Fundamentals
University of California, Berkeley via Coursera
Computational Photography
Georgia Institute of Technology via Coursera
Einführung in Computer Vision
Technische Universität München (Technical University of Munich) via Coursera
Introduction to Computer Vision
Georgia Institute of Technology via Udacity