Performance Analysis of Apache Spark and Presto in Cloud Environments
Offered By: Databricks via YouTube
Course Description
Overview
Explore an in-depth performance analysis of Apache Spark and Presto in cloud environments through this 37-minute conference talk. Gain valuable insights into the performance and cost considerations of these big data analytics systems running on Amazon EMR, with a special focus on Apache Spark's performance on the Databricks Unified Analytics Platform. Learn about the TPC-DS benchmark results, SQL performance comparisons, and the advantages and disadvantages of each solution. Discover quantitative data and expert analysis to help inform your decision-making process when deploying data analytics at scale, avoiding common pitfalls, and optimizing your cloud-based big data infrastructure.
Syllabus
Intro
About BSC
TPC-DS Benchmark Work
Context and motivation
Systems Under Test (SUTs)
Hardware configuration
Software configuration System Runtime 5.5
Benchmark execution time (base)
Cost-Based Optimizer (CBO) stats
Benchmark execution time (stats)
Speedup with table and column stats
Additional configuration for Presto
TPC-DS Power Test - Query 72
Dynamic data partitioning
Benchmark exec. time (part + stats)
Speedup with partitioning and stats
TPC Benchmark total execution time
TPC Benchmark DS metric
System costs
TPC Benchmark DS cost
TPC-DS price-performance
Usability and developer productivity
Conclusions
Taught by
Databricks
Related Courses
Master SQL for Data ScienceLinkedIn Learning Presto Essentials: Data Science
LinkedIn Learning Hadoop Ecosystem Essentials
Packt via FutureLearn Delta Lake 2.0 Overview - New Features and Community Collaborations
Databricks via YouTube An Introduction to Open Source Presto
Databricks via YouTube