YoVDO

Evaluating LLM Applications - Insights from Shahul Es

Offered By: MLOps.community via YouTube

Tags

MLOps Courses Ragas Courses Fine-Tuning Courses Prompt Injection Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Dive into a comprehensive 51-minute podcast episode featuring Shahul Es, a data science expert and Kaggle Grandmaster, as he explores the intricacies of evaluating Large Language Model (LLM) applications. Learn about debugging techniques, troubleshooting strategies, and the challenges associated with benchmarks in open-source models. Gain valuable insights on custom data distributions, the significance of fine-tuning in improving model performance, and the Ragas Project. Discover the importance of evaluation metrics, the impact of gamed leaderboards, and strategies for recommending effective evaluation processes. Explore topics such as prompt injection, alignment, and the concept of "garbage in, garbage out" in LLM applications. Connect with the MLOps community through various channels and access additional resources, including job boards and merchandise.

Syllabus

[] Shahul's preferred coffee
[] Takeaways
[] Please like, share, and subscribe to our MLOps channels!
[] Shahul's definition of Evaluation
[] Evaluation metrics and Benchmarks
[] Gamed leaderboards
[] Best at summarizing long text open-source models
[] Benchmarks
[] Recommending evaluation process
[] LLMs for other LLMs
[] Debugging failed evaluation models
[] Prompt injection
[] Alignment
[] Open Assist
[] Garbage in, garbage out
[] Ragas
[] Valuable use case besides Open AI
[] Fine-tuning LLMs
[] Connect with Shahul if you need help with Ragas @Shahules786 on Twitter
[] Wrap up


Taught by

MLOps.community

Related Courses

AI CTF Solutions - DEFCon31 Hackathon and Kaggle Competition
Rob Mulla via YouTube
Indirect Prompt Injections in the Wild - Real World Exploits and Mitigations
Ekoparty Security Conference via YouTube
Hacking Neural Networks - Introduction and Current Techniques
media.ccc.de via YouTube
The Curious Case of the Rogue SOAR - Vulnerabilities and Exploits in Security Automation
nullcon via YouTube
Mastering Large Language Model Evaluations - Techniques for Ensuring Generative AI Reliability
Data Science Dojo via YouTube