Understanding LLM Benchmark Quality - Who Watches the Watchmen?
Offered By: DevConf via YouTube
Course Description
Overview
Explore the complexities of evaluating Large Language Models (LLMs) in this 30-minute conference talk from DevConf.US 2024. Delve into the world of LLM benchmarks and leaderboards with speaker Erik Erlandson as he examines their effectiveness in measuring model performance. Gain insights into the challenges of assessing LLM outputs, including factual correctness, user safety, and social sensitivity. Learn about the limitations of current benchmarking methods and their ability to capture the full spectrum of human language variations. Discover how to critically evaluate benchmark scores and their relevance to specific applications. Leave equipped with the knowledge to make informed decisions when selecting LLMs for your projects, looking beyond leaderboard rankings to ask pertinent questions about model quality and performance.
Syllabus
Who Watches the Watchmen? Understanding LLM Benchmark Quality - DevConf.US 2024
Taught by
DevConf
Related Courses
Hugging Face on Azure - Partnership and Solutions AnnouncementMicrosoft via YouTube Question Answering in Azure AI - Custom and Prebuilt Solutions - Episode 49
Microsoft via YouTube Open Source Platforms for MLOps
Duke University via Coursera Masked Language Modelling - Retraining BERT with Hugging Face Trainer - Coding Tutorial
rupert ai via YouTube Masked Language Modelling with Hugging Face - Microsoft Sentence Completion - Coding Tutorial
rupert ai via YouTube