How to Evaluate LLM Performance for Domain-Specific Use Cases
Offered By: Snorkel AI via YouTube
Course Description
Overview
Syllabus
Agenda
: Why do we need LLM evaluation?
Common evaluation axes
Why eval is more critical in Gen AI use cases
Why enterprises are often blocked on effective LLM evaluation
Common approaches to LLM evaluation
OSS benchmarks + metrics
LLM-as-a-judge
Annotation strategies
How can we do better than manual annotation strategies?
How data slices enable better LLM evaluation
How does LLM eval work with Snorkel?
Building a quality model
Using fine-grained benchmarks for next steps
Workflow overview review
Workflow—starting with the model
Workflow—Using an LLM as a judge
Workflow—the quality model
Chatbot demo
Annotating data in Snorkel Flow demo
Building labeling functions in Snorkel Flow demo
LLM evaluation in Snorkel Flow demo
Snorkel Flow jupyter notebook demo
Data slices in Snorkel Flow demo
Recap
Snorkel eval offer!
Q&A
Taught by
Snorkel AI
Related Courses
Building and Managing Superior SkillsState University of New York via Coursera ChatGPT et IA : mode d'emploi pour managers et RH
CNAM via France Université Numerique Digital Skills: Artificial Intelligence
Accenture via FutureLearn AI Foundations for Everyone
IBM via Coursera Design a Feminist Chatbot
Institute of Coding via FutureLearn