Introduction to Spark SQL and DataFrames
Offered By: LinkedIn Learning
Course Description
Overview
Learn about DataFrames, a widely used data structure in Apache Spark. Discover how to manipulate and analyze distributed data with the DataFrames API and SQL.
Syllabus
Introduction
- Apache Spark SQL and data analysis
- What you should know
- Introduction to DataFrames
- SQL for DataFrames
- Install Spark
- Install PySpark
- Using Jupyter notebooks with PySpark
- Set up a Jupyter notebook
- Load data into DataFrames: CSV Files
- Load data into DataFrames: JSON Files
- Basic DataFrame operations
- Filter data with DataFrame API
- Aggregate data with DataFrame API
- Sample data from DataFrames
- Save data from DataFrames
- Querying DataFrames with SQL
- Filtering DataFrames with SQL
- Aggregating Data with SQL
- Joining DataFrames with SQL
- Eliminating duplicates in DataFrames
- Working with NA values in DataFrames
- Exploratory data analysis with DataFrames
- Exploratory data analysis with Spark SQL
- Timeseries analysis with DataFrames
- Basic machine learning with DataFrames, part 1
- Basic machine learning with DataFrames, part 2
- Next steps
Taught by
Dan Sullivan
Related Courses
Introduction to DatabasesMeta via Coursera Web Development
Udacity Introduction to Data Science
University of Washington via Coursera Datenmanagement mit SQL
openHPI Sabermetrics 101: Introduction to Baseball Analytics
Boston University via edX