YoVDO

Speculative Decoding: Techniques for Faster LLM Inference

Offered By: Trelis Research via YouTube

Tags

Machine Learning Courses Performance Testing Courses Language Models Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the concept of speculative decoding in this 38-minute video from Trelis Research. Dive into various decoding techniques including naive speculative decoding, prompt-based n-gram speculation, lookahead decoding, and assisted decoding. Learn how these methods can significantly speed up inference in large language models. Follow along with performance testing and analysis of results to understand the practical implications of these techniques. Gain valuable tips for achieving faster inference in your own projects. Access additional resources, including free templates and paid guides, to further enhance your knowledge and implementation of advanced inference techniques.

Syllabus

Faster inference with Speculative Decoding
Video Overview
How speculative decoding works?
Naive speculative decoding
Prompt based n-gram speculation
Lookahead decoding
Assisted decoding
Summary of Decoding Techniques
Performance Testing
Summary of Results
Tips for faster inference


Taught by

Trelis Research

Related Courses

Microsoft Bot Framework and Conversation as a Platform
Microsoft via edX
Unlocking the Power of OpenAI for Startups - Microsoft for Startups
Microsoft via YouTube
Improving Customer Experiences with Speech to Text and Text to Speech
Microsoft via YouTube
Stanford Seminar - Deep Learning in Speech Recognition
Stanford University via YouTube
Select Topics in Python: Natural Language Processing
Codio via Coursera