CMU Multilingual NLP 2020 - Data Augmentation for Machine Translation
Offered By: Graham Neubig via YouTube
Course Description
Overview
Explore data augmentation techniques for machine translation in this 25-minute lecture from CMU's Multilingual Natural Language Processing course. Delve into methods utilizing monolingual data and high-resource languages, covering topics such as back translation, multilingual training approaches, and pivoting strategies. Learn about iterative back-translation, English-HRL augmentation, and dictionary-based techniques. Gain insights into word alignment and word-by-word data augmentation with reordering. Understand the challenges of low-resource machine translation and discover practical solutions to enhance translation quality in resource-constrained scenarios.
Syllabus
Intro
Data Challenges in Low-resource MT
Multilingual Training Approaches
Data Augmentation 101: Back Translation
Back Translation Idea
How to Generate Translations
Iterative Back-translation
Back Translation Issues
English - HRL Augmentation
Augmentation via Pivoting
Data w/ Various Types of Pivoting
Monolingual Data Copying
Dictionary-based Augmentation
An Aside: Word Alignment
Word-by-word Data Augmentation
Word-by-word Augmentation w/ Reordering
Taught by
Graham Neubig
Related Courses
CMU Multilingual NLP - The LORELEI ProjectGraham Neubig via YouTube CMU Multilingual NLP 2022 - Speech
Graham Neubig via YouTube Multilingual NLP 2022 - Language Contact and Change
Graham Neubig via YouTube CMU Multilingual NLP 2022 - Data-Driven Strategies for NMT
Graham Neubig via YouTube CMU Multilingual NLP 2022 - Typology
Graham Neubig via YouTube