YoVDO

How to Evaluate LLM Performance for Domain-Specific Use Cases

Offered By: Snorkel AI via YouTube

Tags

Generative AI Courses Benchmarking Courses Snorkel AI Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the critical aspects of evaluating Large Language Model (LLM) performance for enterprise use cases in this comprehensive 57-minute video presentation. Delve into the nuances of LLM evaluation, learn techniques for assessing response accuracy at scale, and discover methods for identifying areas requiring additional fine-tuning. Gain insights into common challenges and approaches in LLM evaluation, understand the importance of data-centric evaluation methods, and see practical demonstrations of evaluation techniques using Snorkel AI's platform. Follow along as experts discuss topics ranging from OSS benchmarks and metrics to using LLMs as judges, and explore how data slices can enhance evaluation processes. Witness real-world applications through demos of chatbot evaluation, data annotation, and quality model building in Snorkel Flow.

Syllabus

Agenda
: Why do we need LLM evaluation?
Common evaluation axes
Why eval is more critical in Gen AI use cases
Why enterprises are often blocked on effective LLM evaluation
Common approaches to LLM evaluation
OSS benchmarks + metrics
LLM-as-a-judge
Annotation strategies
How can we do better than manual annotation strategies?
How data slices enable better LLM evaluation
How does LLM eval work with Snorkel?
Building a quality model
Using fine-grained benchmarks for next steps
Workflow overview review
Workflow—starting with the model
Workflow—Using an LLM as a judge
Workflow—the quality model
Chatbot demo
Annotating data in Snorkel Flow demo
Building labeling functions in Snorkel Flow demo
LLM evaluation in Snorkel Flow demo
Snorkel Flow jupyter notebook demo
Data slices in Snorkel Flow demo
Recap
Snorkel eval offer!
Q&A


Taught by

Snorkel AI

Related Courses

Investment Strategies and Portfolio Analysis
Rice University via Coursera
Advanced R Programming
Johns Hopkins University via Coursera
Supply Chain Analytics
Rutgers University via Coursera
Технологическое предпринимательство
Moscow Institute of Physics and Technology via Coursera
Learn How To Code: Google's Go (golang) Programming Language
Udemy