YoVDO

Do Androids Know They're Only Dreaming of Electric Sheep?

Offered By: USC Information Sciences Institute via YouTube

Tags

Transformer Models Courses Artificial Intelligence Courses Machine Learning Courses Computational Linguistics Courses Language Models Courses Interpretability Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the intricacies of hallucination detection in transformer language models through this 59-minute lecture presented by Sky Wang from Columbia University at the USC Information Sciences Institute. Delve into the design of probes trained on internal representations to predict hallucinatory behavior in in-context generation tasks. Examine the creation of a span-annotated dataset for organic and synthetic hallucinations across various tasks. Discover the ecological validity challenges of probes trained on synthetic hallucinations for organic hallucination detection. Analyze how hidden state information about hallucination varies across tasks and distributions. Investigate the differences in intrinsic and extrinsic hallucination saliency across layers, hidden state types, and tasks. Learn about the potential of probing as an efficient alternative to language model hallucination evaluation when model states are available. Gain insights from Sky Wang, a Ph.D. candidate in Computer Science at Columbia University, whose research focuses on Natural Language Processing, Computational Social Science, and mechanistic interpretability.

Syllabus

Do Androids Know They’re Only Dreaming of Electric Sheep?


Taught by

USC Information Sciences Institute

Related Courses

Microsoft Bot Framework and Conversation as a Platform
Microsoft via edX
Unlocking the Power of OpenAI for Startups - Microsoft for Startups
Microsoft via YouTube
Improving Customer Experiences with Speech to Text and Text to Speech
Microsoft via YouTube
Stanford Seminar - Deep Learning in Speech Recognition
Stanford University via YouTube
Select Topics in Python: Natural Language Processing
Codio via Coursera