YoVDO

OpenAI Whisper - Robust Speech Recognition via Large-Scale Weak Supervision

Offered By: Aleksa Gordić - The AI Epiphany via YouTube

Tags

Natural Language Processing (NLP) Courses Scaling Laws Courses Language Detection Courses

Course Description

Overview

Dive into a comprehensive video lecture exploring OpenAI's Whisper, a robust speech recognition system developed through large-scale weak supervision. Examine the paper's key findings and delve into the code implementation. Learn about the collection of a vast multi-lingual dataset, evaluation metrics, effective robustness, and scaling laws. Explore the model architecture, transcription tasks, mel spectrograms, language detection, and decoding heuristics. Gain insights into voice activity detection and token logit suppression techniques used in this cutting-edge automatic speech recognition system.

Syllabus

Intro
Paper overview
Collecting a large scale weakly supervised dataset
Evaluation metric issues WER
Effective robustness
Scaling laws in progress
Decoding is hacky
Code walk-through
Model architecture diagram vs code
Transcription task
Loading the audio, mel spectrograms
Language detection
Transcription task continued
Suppressing token logits
Voice activity detection
Decoding and heuristics
Outro


Taught by

Aleksa Gordić - The AI Epiphany

Related Courses

Introduction To Mechanical Micro Machining
Indian Institute of Technology, Kharagpur via Swayam
Biomaterials - Intro to Biomedical Engineering
Udemy
Turbulence as Gibbs Statistics of Vortex Sheets - Alexander Migdal
Institute for Advanced Study via YouTube
City Analytics - Professor Peter Grindrod CBE
Alan Turing Institute via YouTube
How Does Size Shape Our Understanding of and Search for Life? Alien Crash Site with Chris Kempes
Santa Fe Institute via YouTube