Perform data engineering with Azure Synapse Apache Spark Pools
Offered By: Microsoft via Microsoft Learn
Course Description
Overview
- Module 1: Understand big data engineering with Apache Spark in Azure Synapse Analytics
- Differentiate between Apache Spark and Spark pools
- Differentiate between Azure Databricks and Spark pools
- Differentiate between HDInsight and Spark Pools
- Differentiate between Spark Pools and SQL Pools
- Understand the use-cases of data engineering with Apache Spark in Azure Synapse analytics
- Create a Spark pool in Azure Synapse Analytics
- Module 2: Ingest data with Apache Spark notebooks in Azure Synapse Analytics
- Understand the use-cases for Spark Notebooks
- Create a Spark Notebook in Azure Synapse Analytics
- Understand the supported languages in Spark Notebooks
- Develop Spark Notebooks
- Run Spark Notebooks
- Load data in Spark Notebooks
- Save Spark Notebooks
- Module 3: Transform data with DataFrames in Apache Spark Pools in Azure Synapse Analytics
- Understand DataFrames in Spark Pools in Azure Synapse Analytics
- Load data into a Spark DataFrame
- Create a Spark table
- Write Data to and from a storage account
- Load a streaming DataFrame into Apache Spark
- Flatten nested structures and explode arrays with Apache Spark
- Module 4: Integrate SQL and Apache Spark pools in Azure Synapse Analytics
- Describe the integration methods between SQL and Spark Pools in Azure Synapse Analytics
- Understand the use-cases for SQL and Spark Pools integration
- Authenticate in Azure Synapse Analytics
- Transfer data between SQL and Spark Pool in Azure Synapse Analytics
- Authenticate between Spark and SQL Pool in Azure Synapse Analytics
- Integrate SQL and Spark Pools in Azure Synapse Analytics
- Externalize the use of Spark Pools within Azure Synapse workspace
- Transfer data outside the Synapse workspace using SQL Authentication
- Transfer data outside the Synapse workspace using the PySpark Connector
- Transform data in Apache Spark and write back to SQL Pool in Azure Synapse Analytics
- Module 5: Monitor and manage data engineering workloads with Apache Spark in Azure Synapse Analytics
- Monitor Spark Pools in Azure Synapse Analytics
- Understand Resource Utilization of Spark Pools in Azure Synapse Analytics
- Monitor Query activity of Spark Pools in Azure Synapse Analytics
- Base-line Apache Spark performance with Apache Spark History Server in Azure Synapse Analytics
- Optimize Apache Spark jobs in Azure Synapse Analytics
- Automate scaling of Apache Spark pools in Azure Synapse Analytics
After completing this module, you will be able to:
After completing this module, you will be able to:
After completing this module, you will be able to:
After completing this module, you will be able to:
After completing this module, you will be able to:
Syllabus
- Module 1: Understand big data engineering with Apache Spark in Azure Synapse Analytics
- Introduction
- What is an Apache Spark pool in Azure Synapse Analytics
- How do Apache Spark pools work in Azure Synapse Analytics
- When do you use Apache Spark pools in Azure Synapse Analytics
- Knowledge check
- Summary
- Module 2: Ingest data with Apache Spark notebooks in Azure Synapse Analytics
- Introduction
- Introduction to spark notebooks
- Understand the use-cases for spark notebooks
- Exercise: Create a spark notebook in Azure Synapse Analytics
- Discover supported languages in spark notebooks
- Develop spark notebooks
- Exercise: Develop spark notebooks
- Run spark notebooks
- Exercise: Run spark notebooks
- Load data in spark notebooks
- Exercise: Load data in spark notebooks
- Save spark notebooks
- Knowledge check
- Summary
- Module 3: Transform data with DataFrames in Apache Spark Pools in Azure Synapse Analytics
- Introduction
- Introduction to dataframes in spark pools in Azure Synapse Analytics
- Load data into a spark dataframe
- Exercise: Load data into a spark dataframe
- Exercise: Create a spark table
- Flatten nested structures and explode arrays with Apache Spark
- Exercise: Flatten nested structures and explode arrays with Apache Spark in synapse
- Knowledge check
- Summary
- Module 4: Integrate SQL and Apache Spark pools in Azure Synapse Analytics
- Introduction
- Describe the integration methods between SQL and spark pools in Azure Synapse Analytics
- Understand the use-cases for SQL and spark pools integration
- Authenticate in Azure Synapse Analytics
- Transfer data between SQL and spark pool in Azure Synapse Analytics
- Authenticate between spark and SQL pool in Azure Synapse Analytics
- Exercise: Integrate SQL and spark pools in Azure Synapse Analytics
- Externalize the use of spark pools within Azure Synapse Workspace
- Transfer data outside the synapse workspace using the PySpark connector
- Knowledge check
- Summary
- Module 5: Monitor and manage data engineering workloads with Apache Spark in Azure Synapse Analytics
- Introduction
- Monitor spark pools in Azure Synapse Analytics
- Base-line Apache Spark performance with Apache Spark history server in Azure Synapse Analytics
- Optimize Apache Spark jobs in Azure Synapse Analytics
- Automate scaling of Apache Spark pools in Azure Synapse Analytics
- Knowledge check
- Summary
Tags
Related Courses
ETL and ELT BasicsA Cloud Guru Programming Use Cases with Python
A Cloud Guru Microsoft Power BI: Advanced Data Analysis and Visualisation
Cloudswyft via FutureLearn Amazon Connect Data Streaming Intermediate
Amazon Web Services via AWS Skill Builder Analisar e preparar dados com o Amazon SageMaker Data Wrangler e o Amazon EMR (Português (Brasil)) | Lab - Analyze and Prepare Data with Amazon SageMaker Data Wrangler and Amazon EMR (Portuguese (Brazil))
Amazon Web Services via AWS Skill Builder