YoVDO

Red Teaming Language Model Detectors with Language Models

Offered By: USC Information Sciences Institute via YouTube

Tags

Language Models Courses Cybersecurity Courses Prompt Engineering Courses Linguistic Analysis Courses Text Generation Courses Machine Learning Security Courses Adversarial Attacks Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a 48-minute conference talk presented on 2/22/2024 by Yihan Wang from UCLA at the USC Information Sciences Institute. Delve into the investigation of robustness and reliability of large language model (LLM) detectors under adversarial attacks. Learn about two attack strategies: replacing words with context-appropriate synonyms and using instructional prompts to alter writing style. Understand the challenging setting where an auxiliary LLM, also protected by a detector, is used to generate word replacements or instructional prompts. Discover how these attacks effectively compromise detector performance, highlighting the urgent need for improved robustness in LLM-generated text detection systems. Gain insights into other recent works on trustworthy and ethical LLMs. The speaker, Yihan Wang, is a PhD candidate at UCLA focusing on trustworthy and generalizable machine learning, and a recipient of the 2023 UCLA-Amazon Fellowship.

Syllabus

Red Teaming Language Model Detectors with Language Models


Taught by

USC Information Sciences Institute

Related Courses

Computer Security
Stanford University via Coursera
Cryptography II
Stanford University via Coursera
Malicious Software and its Underground Economy: Two Sides to Every Story
University of London International Programmes via Coursera
Building an Information Risk Management Toolkit
University of Washington via Coursera
Introduction to Cybersecurity
National Cybersecurity Institute at Excelsior College via Canvas Network