YoVDO

How to Steal ChatGPT's Embedding Size and Other Low-rank Logit Tricks

Offered By: USC Information Sciences Institute via YouTube

Tags

ChatGPT Courses Linear Algebra Courses Reverse Engineering Courses API Security Courses OpenAI Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the implications of large language model (LLM) commercialization and API restrictions in this 48-minute talk presented by Matt Finlayson from USC Information Sciences Institute. Discover how, with minimal assumptions about model architecture, significant non-public information can be extracted from API-protected LLMs using a relatively small number of queries. Learn about the softmax bottleneck in modern LLMs and how it can be exploited to obtain full-vocabulary outputs, audit model updates, identify source LLMs, and even uncover hidden model sizes. Examine the empirical investigations that led to estimating OpenAI's gpt-3.5-turbo embedding size at approximately 4096. Consider potential safeguards against these techniques and discuss how these capabilities might contribute to greater transparency and accountability in AI development. Gain insights from Finlayson's background in NLP, computer science, and linguistics as he explores the practical consequences of language model architectural design, from security to generation and learning processes.

Syllabus

How to Steal ChatGPT’s Embedding Size, and Other Low-rank Logit Tricks


Taught by

USC Information Sciences Institute

Related Courses

ChatGPT et IA : mode d'emploi pour managers et RH
CNAM via France Université Numerique
Generating New Recipes using GPT-2
Coursera Project Network via Coursera
Deep Learning NLP: Training GPT-2 from scratch
Coursera Project Network via Coursera
Data Science A-Z: Hands-On Exercises & ChatGPT Prize [2024]
Udemy
Deep Learning A-Z 2024: Neural Networks, AI & ChatGPT Prize
Udemy