Hardware, Software, Performance and Costs for Llama-2 70b and Mixtral 8x7b LLM Inference with Low Concurrency
Offered By: Linux Foundation via YouTube
Course Description
Overview
Explore hardware requirements, software configurations, performance metrics, and cost considerations for running Llama-2 70b and Mixtral 8x7b unquantized inference with low concurrency in this informative 52-minute conference talk. Analyze benchmark data from GitHub to compare different frameworks and hardware setups, including on-premises and cloud-based solutions in the USA. Gain insights into the speed and cost advantages of open-source software (OSS) LLMs compared to closed APIs like OpenAI. Learn how to configure servers and replicate benchmarks through practical code examples. Acquire valuable knowledge to jumpstart your journey in implementing high-quality OSS LLM inference for small organizations in 2024, focusing on optimal hardware and software choices for efficient performance.
Syllabus
HW, SW, Performance and Costs for Llama-2 70b and Mixtral 8x7b LLM Inference with Low...- Ivan Baldo
Taught by
Linux Foundation
Tags
Related Courses
Discrete Inference and Learning in Artificial VisionÉcole Centrale Paris via Coursera Teaching Literacy Through Film
The British Film Institute via FutureLearn Linear Regression and Modeling
Duke University via Coursera Probability and Statistics
Stanford University via Stanford OpenEdx Statistical Reasoning
Stanford University via Stanford OpenEdx