Presto on Apache Spark - A Tale of Two Computation Engines
Offered By: Databricks via YouTube
Course Description
Overview
Explore the architectural tradeoffs between map/reduce and parallel databases in this 25-minute conference talk from Databricks. Dive deep into the architectures of Presto and Apache Spark, focusing on key differentiators like disaggregated shuffle. Learn about the Presto-on-Spark project, a specialized Data Frame application that combines Presto's low-latency evaluation with Spark's robust execution engine. Discover the motivation, design, and current status of this initiative aimed at enabling a unified SQL experience for both interactive and batch use cases. Gain insights into Facebook's experience scaling both Presto and Spark for large-scale batch workloads, and understand the potential for greater collaboration between the Spark and Presto communities.
Syllabus
Intro
SOL Use Cases @ Facebook
Towards an Unified SOL Experience
Presto and Spark Architecture
Why Presto (or Other MPPs) Doesn't Scale?
Presto Unlimited
Why Presto-on-Spark
Presto-on-Spark Design Principles
Planning
Translating to RDD
Columnar Format to Row Format Conversion
Broadcast Join
Spark DAG
Execution
Threading Model
Classloader Isolation
Current Status
Taught by
Databricks
Related Courses
Microsoft Azure Exam DP-200 - Implementing an Azure Data SolutionA Cloud Guru Microsoft Azure Exam DP-201 - Designing an Azure Data Solution
A Cloud Guru Microsoft Certified: Azure Data Engineer Associate (DP-203)
A Cloud Guru Traduciendo texto con Amazon Translate
Coursera Project Network via Coursera Apprentissage automatique dans le cloud avec AWS Batch (Français) | Machine Learning in the Cloud with AWS Batch (French)
Amazon Web Services via AWS Skill Builder