Automated Evaluation of LLM Apps with Azure AI-Generative SDK
Offered By: Visual Studio Code via YouTube
Course Description
Overview
Explore automated evaluation techniques for Large Language Model (LLM) applications using the azure-ai-generative SDK in this 27-minute conference talk from Python Data Science Day. Learn about different types of LLM apps, including prompt-only and Retrieval Augmented Generation (RAG) models. Discover how to assess answer quality, implement LLM Ops, and experiment with quality factors. Dive into the AI RAG Chat Evaluator tool, understand the importance of ground truth data, and explore evaluation approaches. Gain insights on improving data sets and future steps in LLM app development. Access valuable resources, including slides, demos, and repositories to enhance your understanding of automated LLM app evaluation.
Syllabus
Automated evaluation of LLM apps with the azure-ai-generative SDK
Types of LLM apps
Prompt-only LLM app
Retrieval Augmented Generation RAG LLM app
RAG flow
Are the answers high quality?
LLM Ops for LLM Apps
Experimenting with quality factors
AI RAG Chat Evaluator: https://aka.ms/rag/eval
Ground truth data
Evaluation
Evaluation approach
Improving ground truth data sets
Next steps
Taught by
Visual Studio Code
Related Courses
Data AnalysisJohns Hopkins University via Coursera Computing for Data Analysis
Johns Hopkins University via Coursera Scientific Computing
University of Washington via Coursera Introduction to Data Science
University of Washington via Coursera Web Intelligence and Big Data
Indian Institute of Technology Delhi via Coursera