Koalas: Scaling Pandas APIs on Apache Spark - Performance and Comparison with Dask
Offered By: Databricks via YouTube
Course Description
Overview
Explore the capabilities and performance of Koalas, an open-source project providing pandas APIs on top of Apache Spark, in this 24-minute talk from Databricks. Learn how Koalas bridges the gap between pandas' data science functionality and Apache Spark's scalability for big data. Compare Koalas with other pandas-scaling libraries, particularly Dask, through benchmarking and performance analysis. Discover the internal framework, execution time improvements, influence of Catalyst, and code generation techniques. Gain insights into recent updates and main changes in Koalas, equipping you with knowledge to effectively handle large-scale data manipulation and analysis.
Syllabus
Introduction
What is Koalas
Internal Frame
Benchmark
Results
Execution Time
Influence of Catalyst
Code Generation
Benchmark Results
Whats New
Main Changes
Taught by
Databricks
Related Courses
Excel 2010Miríadax Intro to Data Science
Udacity Data Manipulation at Scale: Systems and Algorithms
University of Washington via Coursera Statistical Computing with R - a gentle introduction
University College London via Independent Introducción a Data Science: Programación Estadística con R
Universidad Nacional Autónoma de México via Coursera