Non-Parametric Transformers - Paper Explained
Offered By: Aleksa Gordić - The AI Epiphany via YouTube
Course Description
Overview
Dive deep into the world of Non-Parametric Transformers with this comprehensive 46-minute video lecture. Explore the key concepts from the paper "Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning". Learn about the NPT architecture, its connections to BERT, Graph Neural Networks, and CNNs, and understand how it achieves impressive results on tabular data benchmarks. Discover how NPT learns underlying relational and causal mechanisms, and examine its ability to attend to similar vectors. Gain valuable insights into this innovative approach to machine learning through detailed explanations and visual aids.
Syllabus
Key ideas of the paper
Abstract
Note on k-NN non-parametric machine learning
Data and NPT setup explained
NPT loss is inspired by BERT
A high-level architecture overview
NPT jointly learns imputation and prediction
Architecture deep dive input embeddings, etc
More details on the stochastic masking loss
Connections to Graph Neural Networks and CNNs
NPT achieves great results on tabular data benchmarks
NPT learns the underlying relational, causal mechanisms
NPT does rely on other datapoints
NPT attends to similar vectors
Conclusions
Taught by
Aleksa Gordić - The AI Epiphany
Related Courses
Neural Networks for Machine LearningUniversity of Toronto via Coursera 機器學習技法 (Machine Learning Techniques)
National Taiwan University via Coursera Machine Learning Capstone: An Intelligent Application with Deep Learning
University of Washington via Coursera Прикладные задачи анализа данных
Moscow Institute of Physics and Technology via Coursera Leading Ambitious Teaching and Learning
Microsoft via edX