YoVDO

Accelerate and Autoscale Deep Learning Inference on GPUs with KFServing

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Conference Talks Courses TensorFlow Courses PyTorch Courses Knative Courses Autoscaling Courses Hardware Acceleration Courses Deep Learning Inference Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore how to accelerate and autoscale deep learning inference on GPUs using KFServing in this 37-minute conference talk from KubeCon + CloudNativeCon Europe 2021. Learn about the challenges of implementing large-scale language models like BERT and GPT-2 for real-time applications, and discover how KFServing provides a simple model serving interface for common model servers. Gain insights into Bloomberg's use of KFServing for deploying BERT models trained on specialized financial news data, addressing scalability, latency, and throughput issues with Knative's Autoscaler and Activator. Delve into performance debugging tips and examine GPU benchmark results for TensorFlow and PyTorch BERT models deployed to KFServing. Understand how KFServing enables hardware acceleration and autoscaling for improved deep learning inference performance in production environments.

Syllabus

Accelerate and Autoscale Deep Learning Inference on GPUs with KFServing - Dan Sun


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

Designing Highly Scalable Web Apps on Google Cloud Platform
Google via Coursera
Elastic Google Cloud Infrastructure: Scaling and Automation
Google Cloud via Coursera
Elastic Cloud Infrastructure: Scaling and Automation auf Deutsch
Google Cloud via Coursera
Elastic Cloud Infrastructure: Scaling and Automation en Français
Google Cloud via Coursera
Alibaba Cloud Native Solutions and Container Service
Alibaba via Coursera