YoVDO

Do Pretrained Transformers Learn In-Context by Gradient Descent?

Offered By: Center for Language & Speech Processing(CLSP), JHU via YouTube

Tags

Transformers Courses Artificial Intelligence Courses Machine Learning Courses Deep Learning Courses Neural Networks Courses Gradient Descent Courses LLaMA (Large Language Model Meta AI) Courses Language Models Courses In-context Learning Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a 15-minute conference talk presented by Aayush Mishra at ICML 2024, examining the relationship between In-Context Learning (ICL) and Gradient Descent (GD) in pre-trained language models. Delve into the limitations of previous theoretical connections between ICL and GD, highlighting the differences between experimental setups and real-world language model training. Analyze the speaker's findings on the divergent sensitivities of ICL and GD to demonstration order, and examine comprehensive empirical analyses conducted on the LLaMa-7B model. Gain insights into how ICL and GD differently modify output distributions in language models, and understand why the equivalence between these two concepts remains an open hypothesis requiring further investigation.

Syllabus

Do pretrained Transformers Learn In-Context by Gradient Descent? Aayush Mishra (ICML 2024)


Taught by

Center for Language & Speech Processing(CLSP), JHU

Related Courses

Linear Circuits
Georgia Institute of Technology via Coursera
مقدمة في هندسة الطاقة والقوى
King Abdulaziz University via Rwaq (رواق)
Magnetic Materials and Devices
Massachusetts Institute of Technology via edX
Linear Circuits 2: AC Analysis
Georgia Institute of Technology via Coursera
Transmisión de energía eléctrica
Tecnológico de Monterrey via edX