Perform data engineering with Azure Synapse Apache Spark Pools

Offered By: Microsoft via Microsoft Learn

Course Description

Module 1: Understand big data engineering with Apache Spark in Azure Synapse Analytics

After completing this module, you will be able to:

Differentiate between Apache Spark and Spark pools
Differentiate between Azure Databricks and Spark pools
Differentiate between HDInsight and Spark Pools
Differentiate between Spark Pools and SQL Pools
Understand the use-cases of data engineering with Apache Spark in Azure Synapse analytics
Create a Spark pool in Azure Synapse Analytics

After completing this module, you will be able to:

Module 3: Transform data with DataFrames in Apache Spark Pools in Azure Synapse Analytics

After completing this module, you will be able to:

After completing this module, you will be able to:

Describe the integration methods between SQL and Spark Pools in Azure Synapse Analytics
Understand the use-cases for SQL and Spark Pools integration
Authenticate in Azure Synapse Analytics
Transfer data between SQL and Spark Pool in Azure Synapse Analytics
Authenticate between Spark and SQL Pool in Azure Synapse Analytics
Integrate SQL and Spark Pools in Azure Synapse Analytics
Externalize the use of Spark Pools within Azure Synapse workspace
Transfer data outside the Synapse workspace using SQL Authentication
Transfer data outside the Synapse workspace using the PySpark Connector
Transform data in Apache Spark and write back to SQL Pool in Azure Synapse Analytics

Module 5: Monitor and manage data engineering workloads with Apache Spark in Azure Synapse Analytics

After completing this module, you will be able to:

Monitor Spark Pools in Azure Synapse Analytics
Understand Resource Utilization of Spark Pools in Azure Synapse Analytics
Monitor Query activity of Spark Pools in Azure Synapse Analytics
Base-line Apache Spark performance with Apache Spark History Server in Azure Synapse Analytics
Optimize Apache Spark jobs in Azure Synapse Analytics
Automate scaling of Apache Spark pools in Azure Synapse Analytics

Module 1: Understand big data engineering with Apache Spark in Azure Synapse Analytics

Module 3: Transform data with DataFrames in Apache Spark Pools in Azure Synapse Analytics

Introduction
Introduction to dataframes in spark pools in Azure Synapse Analytics
Load data into a spark dataframe
Exercise: Load data into a spark dataframe
Exercise: Create a spark table
Flatten nested structures and explode arrays with Apache Spark
Exercise: Flatten nested structures and explode arrays with Apache Spark in synapse
Knowledge check
Summary

Introduction
Describe the integration methods between SQL and spark pools in Azure Synapse Analytics
Understand the use-cases for SQL and spark pools integration
Authenticate in Azure Synapse Analytics
Transfer data between SQL and spark pool in Azure Synapse Analytics
Authenticate between spark and SQL pool in Azure Synapse Analytics
Exercise: Integrate SQL and spark pools in Azure Synapse Analytics
Externalize the use of spark pools within Azure Synapse Workspace
Transfer data outside the synapse workspace using the PySpark connector
Knowledge check
Summary

Module 5: Monitor and manage data engineering workloads with Apache Spark in Azure Synapse Analytics

Introduction
Monitor spark pools in Azure Synapse Analytics
Base-line Apache Spark performance with Apache Spark history server in Azure Synapse Analytics
Optimize Apache Spark jobs in Azure Synapse Analytics
Automate scaling of Apache Spark pools in Azure Synapse Analytics
Knowledge check
Summary