YoVDO

Pretrained Transformers as Universal Computation Engines

Offered By: Yannic Kilcher via YouTube

Tags

Machine Learning Courses Transfer Learning Courses Transformers Courses Fine-Tuning Courses

Course Description

Overview

Explore a detailed analysis of a machine learning research paper on pretrained transformers as universal computation engines in this informative video. Dive into the concept of fine-tuning large-scale pretrained models for cross-domain transfer, including from language to vision tasks. Learn about Frozen Pretrained Transformers (FPTs) and their ability to generalize to various sequence classification tasks with minimal fine-tuning. Examine the importance of training LayerNorm, modality transfer, network architecture ablations, and model size considerations. Gain insights into the paper's findings on the superiority of language modeling as a pre-training task for cross-domain transfer and the potential of FPTs to match fully trained transformers in zero-shot generalization.

Syllabus

- Intro & Overview
- Frozen Pretrained Transformers
- Evaluated Tasks
- The Importance of Training LayerNorm
- Modality Transfer
- Network Architecture Ablation
- Evaluation of the Attention Mask
- Are FPTs Overfitting or Underfitting?
- Model Size Ablation
- Is Initialization All You Need?
- Full Model Training Overfits
- Again the Importance of Training LayerNorm
- Conclusions & Comments


Taught by

Yannic Kilcher

Related Courses

Linear Circuits
Georgia Institute of Technology via Coursera
مقدمة في هندسة الطاقة والقوى
King Abdulaziz University via Rwaq (رواق)
Magnetic Materials and Devices
Massachusetts Institute of Technology via edX
Linear Circuits 2: AC Analysis
Georgia Institute of Technology via Coursera
Transmisión de energía eléctrica
Tecnológico de Monterrey via edX