YoVDO

How to Steal ChatGPT's Embedding Size and Other Low-rank Logit Tricks

Offered By: USC Information Sciences Institute via YouTube

Tags

ChatGPT Courses Linear Algebra Courses Reverse Engineering Courses API Security Courses OpenAI Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the implications of large language model (LLM) commercialization and API restrictions in this 48-minute talk presented by Matt Finlayson from USC Information Sciences Institute. Discover how, with minimal assumptions about model architecture, significant non-public information can be extracted from API-protected LLMs using a relatively small number of queries. Learn about the softmax bottleneck in modern LLMs and how it can be exploited to obtain full-vocabulary outputs, audit model updates, identify source LLMs, and even uncover hidden model sizes. Examine the empirical investigations that led to estimating OpenAI's gpt-3.5-turbo embedding size at approximately 4096. Consider potential safeguards against these techniques and discuss how these capabilities might contribute to greater transparency and accountability in AI development. Gain insights from Finlayson's background in NLP, computer science, and linguistics as he explores the practical consequences of language model architectural design, from security to generation and learning processes.

Syllabus

How to Steal ChatGPT’s Embedding Size, and Other Low-rank Logit Tricks


Taught by

USC Information Sciences Institute

Related Courses

Coding the Matrix: Linear Algebra through Computer Science Applications
Brown University via Coursera
Mathematical Methods for Quantitative Finance
University of Washington via Coursera
Introduction à la théorie de Galois
École normale supérieure via Coursera
Linear Algebra - Foundations to Frontiers
The University of Texas at Austin via edX
Massively Multivariable Open Online Calculus Course
Ohio State University via Coursera