YoVDO

Large Language Models - Will They Keep Getting Bigger?

Offered By: Massachusetts Institute of Technology via YouTube

Tags

Natural Language Processing (NLP) Courses Model Optimization Courses Fine-Tuning Courses

Course Description

Overview

Explore the future of large language models in this seminar by Luke Zettlemoyer at MIT. Delve into the challenges and possibilities of scaling language models, including sparse mixtures of experts (MoEs) models with reduced cross-node communication costs. Learn about innovative prompting techniques that control for surface form variation, improving performance without extensive task-specific fine-tuning. Discover new forms of supervision for language model training, such as learning from hypertext and multi-modal web page structures. Gain insights into the potential next generation of NLP models, covering topics like modern NLP scaling, algorithmic optimization, parallel training, domain structure, and inference procedures. Examine the benefits and challenges of modular approaches, perplexity numbers, and the fundamental challenges of generic language models. Investigate the role of noisy channel models, fine-tuning, and scoring strings in improving model performance. Consider the impact of web crawls, structured data efficiency, and multimodality on the future of language models.

Syllabus

Introduction
What are language models
Modern NLP
Scaling
sparse models
Gshard
Base Layers
Formal Optimization
Algorithmic Optimization
Experiments
Comparison
Benefits
Dmxlayers
Representations
Simple routing
Training time
Parallel training
Data curation
Unrealistic setting
Domain structure
Inference procedure
Perplexity numbers
Modularity
Remove experts
Summary
Generic language models
Hot dog example
Hot pan example
Common sense example
Large language models
The fundamental challenge
Surface form competition
Flip the reasoning
Key intuition
Noisey channel models
Finetuning
Scoring Strings
Web Crawls
Example Output
Structure Data
Efficiency
Questions
Density estimation
Better training objectives
Optimization
Probability
Induction
multimodality
outliers
compute vs data


Taught by

MIT Embodied Intelligence

Tags

Related Courses

Amazon SageMaker JumpStart Foundations (Japanese)
Amazon Web Services via AWS Skill Builder
AWS Flash - Generative AI with Diffusion Models
Amazon Web Services via AWS Skill Builder
AWS Flash - Operationalize Generative AI Applications (FMOps/LLMOps)
Amazon Web Services via AWS Skill Builder
AWS SimuLearn: Automate Fine-Tuning of an LLM
Amazon Web Services via AWS Skill Builder
AWS SimuLearn: Fine-Tune a Base Model with RLHF
Amazon Web Services via AWS Skill Builder