Scaling LLM Test-Time Compute Optimally for Improved Performance
Offered By: Yannic Kilcher via YouTube
Course Description
Overview
Explore a comprehensive analysis of scaling inference-time computation in Large Language Models (LLMs) through this in-depth video presentation. Delve into the research paper that investigates how LLMs can improve their performance by utilizing additional test-time computation. Examine two primary mechanisms for scaling test-time computation: searching against dense, process-based verifier reward models and updating the model's distribution over a response adaptively. Discover how the effectiveness of different approaches varies depending on prompt difficulty, leading to the development of a "compute-optimal" scaling strategy. Learn how this strategy can improve test-time compute efficiency by more than 4x compared to a best-of-N baseline. Gain insights into the implications of these findings for LLM pretraining and the trade-offs between inference-time and pre-training compute. Understand how, in certain scenarios, test-time compute can be leveraged to outperform significantly larger models in a FLOPs-matched evaluation.
Syllabus
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)
Taught by
Yannic Kilcher
Related Courses
Introduction To Mechanical Micro MachiningIndian Institute of Technology, Kharagpur via Swayam Biomaterials - Intro to Biomedical Engineering
Udemy OpenAI Whisper - Robust Speech Recognition via Large-Scale Weak Supervision
Aleksa Gordić - The AI Epiphany via YouTube Turbulence as Gibbs Statistics of Vortex Sheets - Alexander Migdal
Institute for Advanced Study via YouTube City Analytics - Professor Peter Grindrod CBE
Alan Turing Institute via YouTube