YoVDO

Tuning GPT-3 on a Single GPU via Zero-Shot Hyperparameter Transfer

Offered By: Massachusetts Institute of Technology via YouTube

Tags

GPT-3 Courses ChatGPT Courses Zero-shot learning (ZSL) Courses Hyperparameter Optimization Courses

Course Description

Overview

Explore the groundbreaking technique of tuning GPT-3 hyperparameters on a single GPU through zero-shot hyperparameter transfer in this MIT seminar. Delve into the maximal update parametrization (µP) concept, which allows narrow and wide neural networks to share optimal hyperparameters. Learn how this method enabled tuning of the 6.7 billion parameter GPT-3 version using only 7% of its pretraining compute budget. Discover the theoretical foundations behind µP's unique properties and its connection to infinite-width neural networks and Tensor Programs theory. Gain insights from Greg Yang, a Microsoft Research scientist with a distinguished academic background, as he presents findings based on his research paper. Suitable for both general machine learning practitioners and those interested in theoretical aspects of neural networks.

Syllabus

Introduction
Material
Underlying Technology
Primary Stability
Other Parameters
Methodology
Training Curves
Summary
Intensive vs Extensive Properties
Extensive vs Intensive Properties
The Plan
Example
General Tuning
Experimental Results
BIRD
Evaluation Results
Vertical Foundation
Primarization
Theory of Everything


Taught by

MIT Embodied Intelligence

Tags

Related Courses

Amazon SageMaker: Build an Object Detection Model Using Images Labeled with Ground Truth (Simplified Chinese)
Amazon Web Services via AWS Skill Builder
Amazon SageMaker: Build an Object Detection Model Using Images Labeled with Ground Truth (Indonesian)
Amazon Web Services via AWS Skill Builder
Amazon SageMaker: creazione di un modello di rilevamento degli oggetti utilizzando immagini etichettate con Ground Truth (Italiano) | Amazon SageMaker: Build an Object Detection Model Using Images Labeled with Ground Truth (Italian)
Amazon Web Services via AWS Skill Builder
Amazon SageMaker: creazione di un modello di rilevamento degli oggetti utilizzando immagini etichettate con Ground Truth (Italiano) | Amazon SageMaker: Build an Object Detection Model Using Images Labeled with Ground Truth (Italian)
Amazon Web Services via AWS Skill Builder
Automatic Model Tuning in Amazon SageMaker (Thai)
Amazon Web Services via AWS Skill Builder