YoVDO

Microsoft Azure Data Engineer Associate (DP-203) Cert Prep: 2 Design and Develop Data Processing by Microsoft Press

Offered By: LinkedIn Learning

Tags

Microsoft Azure Courses Apache Spark Courses Azure Data Factory Courses Data Processing Courses Data Engineering Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the fundamental concepts and skills required to design and develop data processing to pass the Microsoft Azure Data Engineer Associate (DP-203) certification exam.

Syllabus

1. Ingest and Transform Data
  • Learning objectives
  • Transform data by using Apache Spark
  • Transform data by using Transact-SQL
  • Transform data by using Data Factory
  • Transform data by using Azure Synapse pipelines
  • Transform data by using Stream Analytics
2. Work with Transformed Data
  • Learning objectives
  • Cleanse data
  • Split data
  • Shred JSON
  • Encode and decode data
3. Troubleshoot Data Transformations
  • Learning objectives
  • Configure error handling for the transformation
  • Normalize and denormalize values
  • Transform data by using Scala
  • Perform data exploratory analysis
4. Design a Batch Processing Solution
  • Learning objectives
  • Develop batch processing solutions by using Data Factory, Data Lake, Spark, Azure Synapse pipelines, PolyBase, and Azure Databricks
  • Create data pipelines
  • Design and implement incremental data loads
  • Design and develop slowly changing dimensions
  • Handle security and compliance requirements
  • Scale resources
5. Develop a Batch Processing Solution
  • Learning objectives
  • Configure the batch size
  • Design and create tests for data pipelines
  • Integrate Jupyter and Python Notebooks into a data pipeline
  • Handle duplicate data
  • Handle missing data
  • Handle late-arriving data
6. Configure a Batch Processing Solution
  • Learning objectives
  • Upsert data
  • Regress to a previous state
  • Design and configure exception handling
  • Configure batch retention
  • Revisit batch processing solution design
  • Debug Spark jobs by using the Spark UI
7. Design a Stream Processing Solution
  • Learning objective
  • Develop a stream processing solution by using Stream Analytics, Azure Databricks, and Azure Event Hubs
  • Process data by using Spark structured streaming
  • Monitor for performance and functional regressions
  • Design and create windowed aggregates
  • Handle schema drift
8. Process Data in a Stream Processing Solution
  • Learning objectives
  • Process time series data
  • Process across partitions
  • Process within one partition
  • Configure checkpoints and watermarking during processing
  • Scale resources
  • Design and create tests for data pipelines
  • Optimize pipelines for analytical or transactional purposes
9. Troubleshoot a Stream Processing Solution
  • Learning objectives
  • Handle interruptions
  • Design and configure exception handling
  • Upsert data
  • Replay archived stream data
  • Design a stream processing solution
10. Manage Batches and Pipelines
  • Learning objectives
  • Trigger batches
  • Handle failed batch loads
  • Validate batch loads
  • Manage data pipelines in Data Factory and Synapse pipelines
  • Schedule data pipelines in Data Factory and Synapse pipelines
  • Implement version control for pipeline artifacts
  • Manage Spark jobs in a pipeline

Taught by

Microsoft Press and Tim Warner

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera