CMU Multilingual NLP 2020 - Data Augmentation for Machine Translation
Offered By: Graham Neubig via YouTube
Course Description
Overview
Explore data augmentation techniques for machine translation in this 25-minute lecture from CMU's Multilingual Natural Language Processing course. Delve into methods utilizing monolingual data and high-resource languages, covering topics such as back translation, multilingual training approaches, and pivoting strategies. Learn about iterative back-translation, English-HRL augmentation, and dictionary-based techniques. Gain insights into word alignment and word-by-word data augmentation with reordering. Understand the challenges of low-resource machine translation and discover practical solutions to enhance translation quality in resource-constrained scenarios.
Syllabus
Intro
Data Challenges in Low-resource MT
Multilingual Training Approaches
Data Augmentation 101: Back Translation
Back Translation Idea
How to Generate Translations
Iterative Back-translation
Back Translation Issues
English - HRL Augmentation
Augmentation via Pivoting
Data w/ Various Types of Pivoting
Monolingual Data Copying
Dictionary-based Augmentation
An Aside: Word Alignment
Word-by-word Data Augmentation
Word-by-word Augmentation w/ Reordering
Taught by
Graham Neubig
Related Courses
Natural Language ProcessingColumbia University via Coursera Natural Language Processing
Stanford University via Coursera Introduction to Natural Language Processing
University of Michigan via Coursera moocTLH: Nuevos retos en las tecnologĂas del lenguaje humano
Universidad de Alicante via MirĂadax Natural Language Processing
Indian Institute of Technology, Kharagpur via Swayam