Neural Nets for NLP 2017 - Attention
Offered By: Graham Neubig via YouTube
Course Description
Overview
Syllabus
Intro
Sentence Representations
Basic Idea (Bahdanau et al. 2015)
Calculating Attention (1)
A Graphical Example
Attention Score Functions (2)
Input Sentence
Previously Generated Things
Various Modalities
Hierarchical Structures (Yang et al. 2016)
Multiple Sources
Intra-Attention / Self Attention (Cheng et al. 2016) • Each element in the sentence attends to other elements + context sensitive encodings!
Coverage
Incorporating Markov Properties (Cohn et al. 2015)
Bidirectional Training (Cohn et al. 2015)
Supervised Training (Mi et al. 2016)
Attention is not Alignment! (Koehn and Knowles 2017) • Attention is often blurred
Monotonic Attention (e.g. Yu et al. 2016)
Convolutional Attention (Allamanis et al. 2016)
Multi-headed Attention
Summary of the "Transformer" (Vaswani et al. 2017)
Attention Tricks
Taught by
Graham Neubig
Related Courses
Transformers: Text Classification for NLP Using BERTLinkedIn Learning TensorFlow: Working with NLP
LinkedIn Learning TransGAN - Two Transformers Can Make One Strong GAN - Machine Learning Research Paper Explained
Yannic Kilcher via YouTube Nyströmformer- A Nyström-Based Algorithm for Approximating Self-Attention
Yannic Kilcher via YouTube Recreate Google Translate - Model Training
Edan Meyer via YouTube