Enable Generative AI Everywhere with Ubiquitous Hardware and Open Software
Offered By: Linux Foundation via YouTube
Course Description
Overview
Explore optimization techniques for Generative AI and Large Language Models (LLMs) in this informative conference talk. Learn about strategies to reduce inference latency and improve performance, including low precision inference, Flash Attention, Efficient Attention in scaled dot product attention (SDPA), optimized KV cache access, and Kernel Fusion. Discover how these optimizations, implemented within PyTorch and Intel Extension for PyTorch, can significantly enhance model efficiency on CPU servers with 4th generation Intel Xeon Scalable Processors. Gain insights into scaling up and out model inference on multiple devices using Tensor Parallel techniques, enabling the deployment of generative AI across various hardware configurations.
Syllabus
Enable Generative AI Everywhere with Ubiquitous Hardware and Open Software - Guobing Chen, Intel
Taught by
Linux Foundation
Tags
Related Courses
اختبار القدرات: كيف تحصل على درجة عالية؟Rwaq (رواق) Browser Rendering Optimization
Google via Udacity 计算机系统基础(一) :程序的表示、转换与链接
Nanjing University via Coursera Managing as a Coach
University of California, Davis via Coursera Drive an Operational Plan to Success
OpenLearning