Pretrained Transformers as Universal Computation Engines
Offered By: Yannic Kilcher via YouTube
Course Description
Overview
Explore a detailed analysis of a machine learning research paper on pretrained transformers as universal computation engines in this informative video. Dive into the concept of fine-tuning large-scale pretrained models for cross-domain transfer, including from language to vision tasks. Learn about Frozen Pretrained Transformers (FPTs) and their ability to generalize to various sequence classification tasks with minimal fine-tuning. Examine the importance of training LayerNorm, modality transfer, network architecture ablations, and model size considerations. Gain insights into the paper's findings on the superiority of language modeling as a pre-training task for cross-domain transfer and the potential of FPTs to match fully trained transformers in zero-shot generalization.
Syllabus
- Intro & Overview
- Frozen Pretrained Transformers
- Evaluated Tasks
- The Importance of Training LayerNorm
- Modality Transfer
- Network Architecture Ablation
- Evaluation of the Attention Mask
- Are FPTs Overfitting or Underfitting?
- Model Size Ablation
- Is Initialization All You Need?
- Full Model Training Overfits
- Again the Importance of Training LayerNorm
- Conclusions & Comments
Taught by
Yannic Kilcher
Related Courses
Models and Platforms for Generative AIIBM via edX Natural Language Processing with Attention Models
DeepLearning.AI via Coursera Circuitos con SPICE: Sistemas trifásicos y análisis avanzado
Pontificia Universidad Católica de Chile via Coursera Linear Circuits
Georgia Institute of Technology via Coursera Intro to AI Transformers
Codecademy