Non-Parametric Transformers - Paper Explained
Offered By: Aleksa Gordić - The AI Epiphany via YouTube
Course Description
Overview
Dive deep into the world of Non-Parametric Transformers with this comprehensive 46-minute video lecture. Explore the key concepts from the paper "Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning". Learn about the NPT architecture, its connections to BERT, Graph Neural Networks, and CNNs, and understand how it achieves impressive results on tabular data benchmarks. Discover how NPT learns underlying relational and causal mechanisms, and examine its ability to attend to similar vectors. Gain valuable insights into this innovative approach to machine learning through detailed explanations and visual aids.
Syllabus
Key ideas of the paper
Abstract
Note on k-NN non-parametric machine learning
Data and NPT setup explained
NPT loss is inspired by BERT
A high-level architecture overview
NPT jointly learns imputation and prediction
Architecture deep dive input embeddings, etc
More details on the stochastic masking loss
Connections to Graph Neural Networks and CNNs
NPT achieves great results on tabular data benchmarks
NPT learns the underlying relational, causal mechanisms
NPT does rely on other datapoints
NPT attends to similar vectors
Conclusions
Taught by
Aleksa Gordić - The AI Epiphany
Related Courses
Deep Learning for Natural Language ProcessingUniversity of Oxford via Independent Sequence Models
DeepLearning.AI via Coursera Deep Learning Part 1 (IITM)
Indian Institute of Technology Madras via Swayam Deep Learning - Part 1
Indian Institute of Technology, Ropar via Swayam Deep Learning - IIT Ropar
Indian Institute of Technology, Ropar via Swayam