YoVDO

Understanding LLM Benchmark Quality - Who Watches the Watchmen?

Offered By: DevConf via YouTube

Tags

Model Evaluation Courses AI Ethics Courses Model Selection Courses Hugging Face Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the complexities of evaluating Large Language Models (LLMs) in this 30-minute conference talk from DevConf.US 2024. Delve into the world of LLM benchmarks and leaderboards with speaker Erik Erlandson as he examines their effectiveness in measuring model performance. Gain insights into the challenges of assessing LLM outputs, including factual correctness, user safety, and social sensitivity. Learn about the limitations of current benchmarking methods and their ability to capture the full spectrum of human language variations. Discover how to critically evaluate benchmark scores and their relevance to specific applications. Leave equipped with the knowledge to make informed decisions when selecting LLMs for your projects, looking beyond leaderboard rankings to ask pertinent questions about model quality and performance.

Syllabus

Who Watches the Watchmen? Understanding LLM Benchmark Quality - DevConf.US 2024


Taught by

DevConf

Related Courses

Macroeconometric Forecasting
International Monetary Fund via edX
Machine Learning With Big Data
University of California, San Diego via Coursera
Data Science at Scale - Capstone Project
University of Washington via Coursera
Structural Equation Model and its Applications | 结构方程模型及其应用 (粤语)
The Chinese University of Hong Kong via Coursera
Data Science in Action - Building a Predictive Churn Model
SAP Learning