Microsoft Azure Data Engineer Associate (DP-203) Cert Prep: 2 Design and Develop Data Processing by Microsoft Press
Offered By: LinkedIn Learning
Course Description
Overview
Explore the fundamental concepts and skills required to design and develop data processing to pass the Microsoft Azure Data Engineer Associate (DP-203) certification exam.
Syllabus
1. Ingest and Transform Data
- Learning objectives
- Transform data by using Apache Spark
- Transform data by using Transact-SQL
- Transform data by using Data Factory
- Transform data by using Azure Synapse pipelines
- Transform data by using Stream Analytics
- Learning objectives
- Cleanse data
- Split data
- Shred JSON
- Encode and decode data
- Learning objectives
- Configure error handling for the transformation
- Normalize and denormalize values
- Transform data by using Scala
- Perform data exploratory analysis
- Learning objectives
- Develop batch processing solutions by using Data Factory, Data Lake, Spark, Azure Synapse pipelines, PolyBase, and Azure Databricks
- Create data pipelines
- Design and implement incremental data loads
- Design and develop slowly changing dimensions
- Handle security and compliance requirements
- Scale resources
- Learning objectives
- Configure the batch size
- Design and create tests for data pipelines
- Integrate Jupyter and Python Notebooks into a data pipeline
- Handle duplicate data
- Handle missing data
- Handle late-arriving data
- Learning objectives
- Upsert data
- Regress to a previous state
- Design and configure exception handling
- Configure batch retention
- Revisit batch processing solution design
- Debug Spark jobs by using the Spark UI
- Learning objective
- Develop a stream processing solution by using Stream Analytics, Azure Databricks, and Azure Event Hubs
- Process data by using Spark structured streaming
- Monitor for performance and functional regressions
- Design and create windowed aggregates
- Handle schema drift
- Learning objectives
- Process time series data
- Process across partitions
- Process within one partition
- Configure checkpoints and watermarking during processing
- Scale resources
- Design and create tests for data pipelines
- Optimize pipelines for analytical or transactional purposes
- Learning objectives
- Handle interruptions
- Design and configure exception handling
- Upsert data
- Replay archived stream data
- Design a stream processing solution
- Learning objectives
- Trigger batches
- Handle failed batch loads
- Validate batch loads
- Manage data pipelines in Data Factory and Synapse pipelines
- Schedule data pipelines in Data Factory and Synapse pipelines
- Implement version control for pipeline artifacts
- Manage Spark jobs in a pipeline
Taught by
Microsoft Press and Tim Warner
Related Courses
内存数据库管理openHPI CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX Processing Big Data with Azure Data Lake Analytics
Microsoft via edX Google Cloud Big Data and Machine Learning Fundamentals en Español
Google Cloud via Coursera Google Cloud Big Data and Machine Learning Fundamentals 日本語版
Google Cloud via Coursera