Performance Analysis of Apache Spark and Presto in Cloud Environments
Offered By: Databricks via YouTube
Course Description
Overview
Explore an in-depth performance analysis of Apache Spark and Presto in cloud environments through this 37-minute conference talk. Gain valuable insights into the performance and cost considerations of these big data analytics systems running on Amazon EMR, with a special focus on Apache Spark's performance on the Databricks Unified Analytics Platform. Learn about the TPC-DS benchmark results, SQL performance comparisons, and the advantages and disadvantages of each solution. Discover quantitative data and expert analysis to help inform your decision-making process when deploying data analytics at scale, avoiding common pitfalls, and optimizing your cloud-based big data infrastructure.
Syllabus
Intro
About BSC
TPC-DS Benchmark Work
Context and motivation
Systems Under Test (SUTs)
Hardware configuration
Software configuration System Runtime 5.5
Benchmark execution time (base)
Cost-Based Optimizer (CBO) stats
Benchmark execution time (stats)
Speedup with table and column stats
Additional configuration for Presto
TPC-DS Power Test - Query 72
Dynamic data partitioning
Benchmark exec. time (part + stats)
Speedup with partitioning and stats
TPC Benchmark total execution time
TPC Benchmark DS metric
System costs
TPC Benchmark DS cost
TPC-DS price-performance
Usability and developer productivity
Conclusions
Taught by
Databricks
Related Courses
Introduction to LinuxLinux Foundation via edX Rapid Deployment of SAP Solutions
SAP Learning SAP Screen Personas
SAP Learning Office 365: Managing Identities and Services
Microsoft via edX Microsoft Exchange Server 2016 - 3: Mailbox Databases
Microsoft via edX