Introduction to Spark SQL and DataFrames
Offered By: LinkedIn Learning
Course Description
Overview
Learn about DataFrames, a widely used data structure in Apache Spark. Discover how to manipulate and analyze distributed data with the DataFrames API and SQL.
Syllabus
Introduction
- Apache Spark SQL and data analysis
- What you should know
- Introduction to DataFrames
- SQL for DataFrames
- Install Spark
- Install PySpark
- Using Jupyter notebooks with PySpark
- Set up a Jupyter notebook
- Load data into DataFrames: CSV Files
- Load data into DataFrames: JSON Files
- Basic DataFrame operations
- Filter data with DataFrame API
- Aggregate data with DataFrame API
- Sample data from DataFrames
- Save data from DataFrames
- Querying DataFrames with SQL
- Filtering DataFrames with SQL
- Aggregating Data with SQL
- Joining DataFrames with SQL
- Eliminating duplicates in DataFrames
- Working with NA values in DataFrames
- Exploratory data analysis with DataFrames
- Exploratory data analysis with Spark SQL
- Timeseries analysis with DataFrames
- Basic machine learning with DataFrames, part 1
- Basic machine learning with DataFrames, part 2
- Next steps
Taught by
Dan Sullivan
Related Courses
Introduction to Artificial IntelligenceStanford University via Udacity Natural Language Processing
Columbia University via Coursera Probabilistic Graphical Models 1: Representation
Stanford University via Coursera Computer Vision: The Fundamentals
University of California, Berkeley via Coursera Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent