How to Evaluate LLM Performance for Domain-Specific Use Cases
Offered By: Snorkel AI via YouTube
Course Description
Overview
Syllabus
Agenda
: Why do we need LLM evaluation?
Common evaluation axes
Why eval is more critical in Gen AI use cases
Why enterprises are often blocked on effective LLM evaluation
Common approaches to LLM evaluation
OSS benchmarks + metrics
LLM-as-a-judge
Annotation strategies
How can we do better than manual annotation strategies?
How data slices enable better LLM evaluation
How does LLM eval work with Snorkel?
Building a quality model
Using fine-grained benchmarks for next steps
Workflow overview review
Workflow—starting with the model
Workflow—Using an LLM as a judge
Workflow—the quality model
Chatbot demo
Annotating data in Snorkel Flow demo
Building labeling functions in Snorkel Flow demo
LLM evaluation in Snorkel Flow demo
Snorkel Flow jupyter notebook demo
Data slices in Snorkel Flow demo
Recap
Snorkel eval offer!
Q&A
Taught by
Snorkel AI
Related Courses
Solving the Last Mile Problem of Foundation Models with Data-Centric AIMLOps.community via YouTube Foundational Models in Enterprise AI - Challenges and Opportunities
MLOps.community via YouTube Knowledge Distillation Demystified: Techniques and Applications
Snorkel AI via YouTube Model Distillation - From Large Models to Efficient Enterprise Solutions
Snorkel AI via YouTube Curate Training Data via Labeling Functions - 10 to 100x Faster
Snorkel AI via YouTube