Performance Analysis of Apache Spark and Presto in Cloud Environments
Offered By: Databricks via YouTube
Course Description
Overview
Explore an in-depth performance analysis of Apache Spark and Presto in cloud environments through this 37-minute conference talk. Gain valuable insights into the performance and cost considerations of these big data analytics systems running on Amazon EMR, with a special focus on Apache Spark's performance on the Databricks Unified Analytics Platform. Learn about the TPC-DS benchmark results, SQL performance comparisons, and the advantages and disadvantages of each solution. Discover quantitative data and expert analysis to help inform your decision-making process when deploying data analytics at scale, avoiding common pitfalls, and optimizing your cloud-based big data infrastructure.
Syllabus
Intro
About BSC
TPC-DS Benchmark Work
Context and motivation
Systems Under Test (SUTs)
Hardware configuration
Software configuration System Runtime 5.5
Benchmark execution time (base)
Cost-Based Optimizer (CBO) stats
Benchmark execution time (stats)
Speedup with table and column stats
Additional configuration for Presto
TPC-DS Power Test - Query 72
Dynamic data partitioning
Benchmark exec. time (part + stats)
Speedup with partitioning and stats
TPC Benchmark total execution time
TPC Benchmark DS metric
System costs
TPC Benchmark DS cost
TPC-DS price-performance
Usability and developer productivity
Conclusions
Taught by
Databricks
Related Courses
A Hands-On Look at Amazon Q Business ExpertAmazon Web Services via AWS Skill Builder À la découverte des télécommunications
Institut Mines-Télécom via France Université Numerique A Tour of Google Cloud Sustainability
Google via Google Cloud Skills Boost Intel® Telco Cloud Academy
Intel via Coursera Accéder à Internet depuis Lambda dans un VPC (Français) | Accessing the Internet from Lambda in a VPC (French)
Amazon Web Services via AWS Skill Builder