Do Pretrained Transformers Learn In-Context by Gradient Descent?
Offered By: Center for Language & Speech Processing(CLSP), JHU via YouTube
Course Description
Overview
Explore a 15-minute conference talk presented by Aayush Mishra at ICML 2024, examining the relationship between In-Context Learning (ICL) and Gradient Descent (GD) in pre-trained language models. Delve into the limitations of previous theoretical connections between ICL and GD, highlighting the differences between experimental setups and real-world language model training. Analyze the speaker's findings on the divergent sensitivities of ICL and GD to demonstration order, and examine comprehensive empirical analyses conducted on the LLaMa-7B model. Gain insights into how ICL and GD differently modify output distributions in language models, and understand why the equivalence between these two concepts remains an open hypothesis requiring further investigation.
Syllabus
Do pretrained Transformers Learn In-Context by Gradient Descent? Aayush Mishra (ICML 2024)
Taught by
Center for Language & Speech Processing(CLSP), JHU
Related Courses
CMU Advanced NLP: How to Use Pre-Trained ModelsGraham Neubig via YouTube Stanford Seminar 2022 - Transformer Circuits, Induction Heads, In-Context Learning
Stanford University via YouTube Pretraining Task Diversity and the Emergence of Non-Bayesian In-Context Learning for Regression
Simons Institute via YouTube In-Context Learning: A Case Study of Simple Function Classes
Simons Institute via YouTube AI Mastery: Ultimate Crash Course in Prompt Engineering for Large Language Models
Data Science Dojo via YouTube