The Pitfalls of Next-token Prediction in Language Models
Offered By: Simons Institute via YouTube
Course Description
Overview
Explore a thought-provoking lecture that delves into the limitations of next-token prediction in modeling human intelligence. Examine the critical distinction between autoregressive inference and teacher-forced training in language models. Discover why the popular criticism of error compounding during autoregressive inference may overlook a more fundamental issue: the potential failure of teacher-forcing to learn accurate next-token predictors for certain task classes. Investigate a general mechanism of teacher-forcing failure and analyze empirical evidence from a minimal planning task where both Transformer and Mamba architectures struggle. Consider the potential benefits of training models to predict multiple tokens in advance as a possible solution. Gain insights that can inform future debates and inspire research beyond the current next-token prediction paradigm in artificial intelligence.
Syllabus
The Pitfalls of Next-token Prediction
Taught by
Simons Institute
Related Courses
Introduction to Artificial IntelligenceStanford University via Udacity Natural Language Processing
Columbia University via Coursera Probabilistic Graphical Models 1: Representation
Stanford University via Coursera Computer Vision: The Fundamentals
University of California, Berkeley via Coursera Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent