YoVDO

Introduction to Spark SQL and DataFrames

Offered By: LinkedIn Learning

Tags

Apache Spark Courses Data Analysis Courses Machine Learning Courses SQL Courses Jupyter Notebooks Courses Data Engineering Courses Exploratory Data Analysis Courses DataFrames Courses Spark SQL Courses

Course Description

Overview

Learn about DataFrames, a widely used data structure in Apache Spark. Discover how to manipulate and analyze distributed data with the DataFrames API and SQL.

Syllabus

Introduction
  • Apache Spark SQL and data analysis
  • What you should know
1. Introduction to Spark DataFrames
  • Introduction to DataFrames
  • SQL for DataFrames
2. Installing Spark
  • Install Spark
  • Install PySpark
  • Using Jupyter notebooks with PySpark
3. Getting Started with Spark DataFrames
  • Set up a Jupyter notebook
  • Load data into DataFrames: CSV Files
  • Load data into DataFrames: JSON Files
  • Basic DataFrame operations
  • Filter data with DataFrame API
  • Aggregate data with DataFrame API
  • Sample data from DataFrames
  • Save data from DataFrames
4. SQL for DataFrames
  • Querying DataFrames with SQL
  • Filtering DataFrames with SQL
  • Aggregating Data with SQL
  • Joining DataFrames with SQL
  • Eliminating duplicates in DataFrames
  • Working with NA values in DataFrames
5. Data Analysis with Spark
  • Exploratory data analysis with DataFrames
  • Exploratory data analysis with Spark SQL
  • Timeseries analysis with DataFrames
  • Basic machine learning with DataFrames, part 1
  • Basic machine learning with DataFrames, part 2
Conclusion
  • Next steps

Taught by

Dan Sullivan

Related Courses

Big Data
University of Adelaide via edX
Advanced Data Science with IBM
IBM via Coursera
Analysing Unstructured Data using MongoDB and PySpark
Coursera Project Network via Coursera
Apache Spark for Data Engineering and Machine Learning
IBM via edX
Apache Spark (TM) SQL for Data Analysts
Databricks via Coursera