YoVDO

Efficient Inference of Extremely Large Transformer Models

Offered By: Toronto Machine Learning Series (TMLS) via YouTube

Tags

Transformer Models Courses Machine Learning Courses Deep Learning Courses Model Optimization Courses Model Compression Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the challenges and solutions for efficient inference of massive transformer-based language models in this 28-minute Toronto Machine Learning Series (TMLS) talk. Dive into the world of multi-billion-parameter models and learn how they are optimized for production environments. Discover key techniques for making these behemoth models faster, smaller, and more cost-effective, including model compression, efficient attention mechanisms, and optimal model parallelism strategies. Gain insights from Bharat Venkitesh, Senior Machine Learning Engineer at Cohere, as he discusses the establishment of the inference tech stack and the latest advancements in handling extremely large transformer models.

Syllabus

Efficient Inference of Extremely Large Transformer Models


Taught by

Toronto Machine Learning Series (TMLS)

Related Courses

3D-печать для всех и каждого
Tomsk State University via Coursera
Developing a Multidimensional Data Model
Microsoft via edX
Launching into Machine Learning 日本語版
Google Cloud via Coursera
Art and Science of Machine Learning 日本語版
Google Cloud via Coursera
Launching into Machine Learning auf Deutsch
Google Cloud via Coursera