Visual Features for Context-Aware Speech Recognition - 2016

Offered By: Center for Language & Speech Processing(CLSP), JHU via YouTube

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Explore cutting-edge techniques for improving automatic speech recognition in challenging multimedia content through this comprehensive lecture by Florian Metze from Carnegie Mellon University. Delve into methods for adapting acoustic and language models using visual context from video, such as detected objects and scenes. Learn about experiments on "how-to" videos that demonstrate reduced word error rates by incorporating visual information. Examine approaches for handling speech variability, speaker-microphone distance, and audio-visual fusion. Gain insights into applications for robotics, human-computer interaction, and large-scale multimedia indexing. Discover how this research aims to bridge the gap between video-to-text and speech-to-text communities.

Syllabus

Intro
Outline
Automatic Speech Recognition
Speech Variability (Spectral)
Decoding Procedure
Experimental Setup
Simple Extensions
Performance on Switchboard
IARPA "Aladdin" Project
Speaker Microphone Distance (SMD)
Training SMD Extractors
Training SMD descriptors
SMD Results
SMD Analysis
Audio-Visual ASR
Speaker Attributes
Speaker Actions
Semantic Indexing CNN Features
Fusion of Approaches
Analysis "indoor" vs "outdoor"
Summary

Taught by

Center for Language & Speech Processing(CLSP), JHU

Visual Features for Context-Aware Speech Recognition - 2016

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Visual Features for Context-Aware Speech Recognition - 2016

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue