YoVDO

LLM Evaluation: Challenges and Best Practices - MLOps Podcast #210

Offered By: MLOps.community via YouTube

Tags

MLOps Courses Fine-Tuning Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the intricacies of Language Model (LLM) evaluation in this 56-minute podcast featuring Aparna Dhinakaran, Co-Founder and Chief Product Officer of Arize AI. Delve into the complexities of LLM assessment, the significance of the Phoenix evaluations library, and the importance of tailored evaluations in software applications. Examine the nuances of AI fine-tuning, debate the merits of open-source versus private models, and understand the urgency of deploying models into production for early bottleneck identification. Learn about the relevance of retrieved information, output legitimacy, and the operational advantages of Phoenix in supporting LLM evaluations. Gain insights from Dhinakaran's extensive experience in ML infrastructure and AI observability as she discusses real-world challenges and solutions in LLM implementation and evaluation.

Syllabus

[] AI in Production Conference
[] Aparna preferred coffee
[] Takeaways
[] Shout out to Arize team for being a sponsor of the MLOps Community since 2020!
[] Please like, share, and subscribe to our MLOps channels!
[] Evaluation space
[] Chatbots Prevent Misinformation
[] Evaluating AI response based on factual retrieval
[] Balancing eval response and impact on speed
[] Context length, placement, and information recall study
[] GPT-4 excels, prompt iterations affect outcomes
[] Multiple sub-steps and requiring visibility on Application calls
[] Evaluate calls, breakdown, score, and application evaluation
[] Rata classification for effective evaluation Research
[] Benchmarks on Huggingface and Twitter reliability
[] Power of observability and retrieval embeddings
[] Tweaking data points
[] Hot take
[] Bottlenecks and errors from rapid production


Taught by

MLOps.community

Related Courses

Machine Learning Operations (MLOps): Getting Started
Google Cloud via Coursera
Проектирование и реализация систем машинного обучения
Higher School of Economics via Coursera
Demystifying Machine Learning Operations (MLOps)
Pluralsight
Machine Learning Engineer with Microsoft Azure
Microsoft via Udacity
Machine Learning Engineering for Production (MLOps)
DeepLearning.AI via Coursera