Performance Analysis of Apache Spark and Presto in Cloud Environments
Offered By: Databricks via YouTube
Course Description
Overview
Explore an in-depth performance analysis of Apache Spark and Presto in cloud environments through this 37-minute conference talk. Gain valuable insights into the performance and cost considerations of these big data analytics systems running on Amazon EMR, with a special focus on Apache Spark's performance on the Databricks Unified Analytics Platform. Learn about the TPC-DS benchmark results, SQL performance comparisons, and the advantages and disadvantages of each solution. Discover quantitative data and expert analysis to help inform your decision-making process when deploying data analytics at scale, avoiding common pitfalls, and optimizing your cloud-based big data infrastructure.
Syllabus
Intro
About BSC
TPC-DS Benchmark Work
Context and motivation
Systems Under Test (SUTs)
Hardware configuration
Software configuration System Runtime 5.5
Benchmark execution time (base)
Cost-Based Optimizer (CBO) stats
Benchmark execution time (stats)
Speedup with table and column stats
Additional configuration for Presto
TPC-DS Power Test - Query 72
Dynamic data partitioning
Benchmark exec. time (part + stats)
Speedup with partitioning and stats
TPC Benchmark total execution time
TPC Benchmark DS metric
System costs
TPC Benchmark DS cost
TPC-DS price-performance
Usability and developer productivity
Conclusions
Taught by
Databricks
Related Courses
Understanding China, 1700-2000: A Data Analytic Approach, Part 1The Hong Kong University of Science and Technology via Coursera The Analytics Edge
Massachusetts Institute of Technology via edX 大数据与信息传播 Big Data and Information Dissemination
Fudan University via Coursera The Future of Fashion
Marist College via Independent The Mobile Consumer
Marist College via Independent