YoVDO

Optimizing Cost and Performance with Arm64

Offered By: USENIX via YouTube

Tags

SREcon Courses Data Storage Courses Telemetry Courses Cost Optimization Courses Service-Level Objectives Courses Arm64 Courses

Course Description

Overview

Explore the journey of Honeycomb.io, a Series B startup in the observability space, as they evaluate and implement arm64 processor architecture to optimize cost and performance of their telemetry ingest and indexing workload. Dive into the process of setting up the evaluation, full migration, and improvements made to the ecosystem over a year-long period. Learn how 92% of all compute workloads were successfully migrated to arm64, resulting in a 40% drop in compute costs and modest improvements in end-user visible latency. Discover the roadblocks and challenges faced, including lack of full software compatibility, hidden performance quirks, and additional complexity. Gain insights into the history of processor architectures, the efficiency of ARM, and the importance of Service Level Objectives (SLOs) in user flows. Explore the service architecture, including the Shepherd ingest API service and Retriever, and understand the steps taken to migrate production environments. Examine the impact of AWS instance availability and Kafka on the migration process. Conclude with valuable lessons learned, including setting measurable goals, acknowledging hidden risks, prioritizing team well-being, and optimizing for safety in large-scale migrations.

Syllabus

Intro
WTF is architecture? Why multiarch?
History: 80s, 90s, 00s, 10s, and beyond
If it ain't broke...
ARM is more efficient.
Data storage engine and analytics tool
Service Level Objectives (SLO)
SLOs are user flows
Same reliability, lower costs with ARM6
Complexity stayed manageable
Prod: customers observe data
Kibble observes dogfood
Dogfood observes prod
Service Architecture
Shepherd: ingest API service
Is it feasible to migrate?
Producing artifacts for Arm64
Initial findings
A/B testing
Dogfood Shepherd cost reduction
Migrated prod Shepherd
Migrated prod Retriever
AWS ran out of m6gd spot instances
Kafka + the long tail
Graviton2 going strong
Have a measurable goal in mind
Acknowledge hidden risks
Take care of your people
Optimize for safety
Graviton2 blog posts


Taught by

USENIX

Related Courses

Amazon S3 Basics
A Cloud Guru
AWS Certified Big Data - Specialty
A Cloud Guru
AWS Certified Data Analytics - Specialty
A Cloud Guru
AWS Certified SysOps Administrator Associate - SOA-C01 (LA)
A Cloud Guru
Azure AI Implementation and Monitoring
A Cloud Guru