Introduction to Spark Datasets
Offered By: Scala Days Conferences via YouTube
Course Description
Overview
Explore Apache Spark's Dataset API in this 43-minute conference talk from Scala Days Copenhagen 2017. Dive into the basics of working with Spark Datasets, a hybrid approach that combines functional and relational programming concepts. Learn about Spark's components, including machine learning and streaming, and how they're being rewritten to support Dataset-compatible APIs. Discover the performance benefits and space efficiency of Spark SQL, and gain hands-on experience loading JSON data, applying schemas, and performing relational transformations. Understand how the optimizer works and how to mix functional and relational styles effectively. Examine windowed operations and window specifications, and grasp why Datasets are becoming increasingly important in the Spark ecosystem. No prior Spark knowledge is required, but a basic understanding of Scala is recommended.
Syllabus
Intro
What is Spark?
The different pieces of Spark
Why should we consider Spark SQL?
What is the performance like?
How is it so fast?
How much more space efficient?
Getting started
Loading some simple JSON data
Sample case class for schema
Then apply some type magic
What do relational transforms look like?
Writing a relational transformation
What can the optimizer do now?
Using Datasets to mix functional & relational style
And functional style maps
What is DS functional perf like?
Build the recipe for each query
Windowed operations
Window specs
Summary: Why to use Datasets
The next book.....
Taught by
Scala Days Conferences
Related Courses
Functional Programming Principles in ScalaÉcole Polytechnique Fédérale de Lausanne via Coursera Functional Program Design in Scala
École Polytechnique Fédérale de Lausanne via Coursera Parallel programming
École Polytechnique Fédérale de Lausanne via Coursera Big Data Analysis with Scala and Spark
École Polytechnique Fédérale de Lausanne via Coursera Functional Programming in Scala Capstone
École Polytechnique Fédérale de Lausanne via Coursera