Implement a data lakehouse analytics solution with Azure Databricks
Offered By: Microsoft via Microsoft Learn
Course Description
Overview
- Module 1: Describe Azure Databricks
- Understand the Azure Databricks platform
- Create your own Azure Databricks workspace
- Create a notebook inside your home folder in Databricks
- Understand the fundamentals of Apache Spark notebook
- Create, or attach to, a Spark cluster
- Identify the types of tasks well-suited to the unified analytics engine Apache Spark
- Module 2: Spark architecture fundamentals
- Understand the architecture of an Azure Databricks Spark Cluster
- Understand the architecture of a Spark Job
- Module 3: Read and write data in Azure Databricks
- Use Azure Databricks to read multiple file types, both with and without a Schema.
- Combine inputs from files and data stores, such as Azure SQL Database.
- Transform and store that data for advanced analytics.
- Module 4: Work with DataFrames in Azure Databricks
- Use the count() method to count rows in a DataFrame
- Use the display() function to display a DataFrame in the Notebook
- Cache a DataFrame for quicker operations if the data is needed a second time
- Use the limit function to display a small set of rows from a larger DataFrame
- Use select() to select a subset of columns from a DataFrame
- Use distinct() and dropDuplicates to remove duplicate data
- Use drop() to remove columns from a DataFrame
- Module 5: Describe lazy evaluation and other performance features in Azure Databricks
- Describe the difference between eager and lazy execution
- Define and identify transformations
- Define and identify actions
- Describe the fundamentals of how the Catalyst Optimizer works
- Differentiate between wide and narrow transformations
- Module 6: Work with DataFrames columns in Azure Databricks
- Learn the syntax for specifying column values for filtering and aggregations
- Understand the use of the Column Class
- Sort and filter a DataFrame based on Column Values
- Use collect() and take() to return records from a Dataframe to the driver of the cluster
- Module 7: Work with DataFrames advanced methods in Azure Databricks
- Manipulate date and time values in Azure Databricks
- Rename columns in Azure Databricks
- Aggregate data in Azure Databricks DataFrames
- Module 8: Describe platform architecture, security, and data protection in Azure Databricks
- Learn the Azure Databricks platform architecture and how it is secured.
- Use Azure Key Vault to store secrets used by Azure Databricks and other services.
- Access Azure Storage with Key Vault-based secrets.
- Module 9: Build and query a Delta Lake
- Learn about the key features and use cases of Delta Lake.
- Use Delta Lake to create, append, and upsert tables.
- Perform optimizations in Delta Lake.
- Compare different versions of a Delta table using Time Machine.
- Module 10: Process streaming data with Azure Databricks structured streaming
- Learn the key features and uses of Structured Streaming.
- Stream data from a file and write it out to a distributed file system.
- Use sliding windows to aggregate over chunks of data rather than all data.
- Apply watermarking to throw away stale old data that you do not have space to keep.
- Connect to Event Hubs read and write streams.
- Module 11: Describe Azure Databricks Delta Lake architecture
- Process batch and streaming data with Delta Lake.
- Learn how Delta Lake architecture enables unified streaming and batch analytics with transactional guarantees within a data lake.
- Module 12: Create production workloads on Azure Databricks with Azure Data Factory
- Create an Azure Data Factory pipeline with a Databricks activity.
- Execute a Databricks notebook with a parameter.
- Retrieve and log a parameter passed back from the notebook.
- Monitor your Data Factory pipeline.
- Module 13: Implement CI/CD with Azure DevOps
- Learn about CI/CD and how it applies to data engineering.
- Use Azure DevOps as a source code repository for Azure Databricks notebooks.
- Create build and release pipelines in Azure DevOps to automatically deploy a notebook from a development to a production Azure Databricks workspace.
- Module 14: Integrate Azure Databricks with Azure Synapse
- Access Azure Synapse Analytics from Azure Databricks by using the - SQL Data Warehouse connector.
- Module 15: Describe Azure Databricks best practices
- Workspace administration
- Security
- Tools & integration
- Databricks runtime
- HA/DR
- Clusters
In this module, you will:
In this module, you will:
In this module, you will:
In this module, you will:
In this module, you will:
In this module, you will:
In this module, you will:
In this module, you will:
In this module, you will:
In this module, you will:
In this module, you will:
In this module, you'll:
In this module, you will:
In this module, you will:
In this module, you will learn best practices in the following categories:
Syllabus
- Module 1: Describe Azure Databricks
- Introduction
- Explain Azure Databricks
- Create an Azure Databricks workspace and cluster
- Understand Azure Databricks Notebooks
- Exercise: Work with Notebooks
- Knowledge check
- Summary
- Module 2: Spark architecture fundamentals
- Introduction
- Understand the architecture of Azure Databricks spark cluster
- Understand the architecture of spark job
- Knowledge check
- Summary
- Module 3: Read and write data in Azure Databricks
- Introduction
- Read data in CSV format
- Read data in JSON format
- Read data in Parquet format
- Read data stored in tables and views
- Write data
- Exercises: Read and write data
- Knowledge check
- Summary
- Module 4: Work with DataFrames in Azure Databricks
- Introduction
- Describe a DataFrame
- Use common DataFrame methods
- Use the display function
- Exercise: Distinct articles
- Knowledge check
- Summary
- Module 5: Describe lazy evaluation and other performance features in Azure Databricks
- Introduction
- Describe the difference between eager and lazy execution
- Describe the fundamentals of how the Catalyst Optimizer works
- Define and identify actions and transformations
- Describe performance enhancements enabled by shuffle operations and Tungsten
- Knowledge check
- Summary
- Module 6: Work with DataFrames columns in Azure Databricks
- Introduction
- Describe the column class
- Work with column expressions
- Exercise: Washingtons and Marthas
- Knowledge check
- Summary
- Module 7: Work with DataFrames advanced methods in Azure Databricks
- Introduction
- Perform date and time manipulation
- Use aggregate functions
- Exercise: Deduplication of data
- Knowledge check
- Summary
- Module 8: Describe platform architecture, security, and data protection in Azure Databricks
- Introduction
- Describe the Azure Databricks platform architecture
- Perform data protection
- Describe Azure key vault and Databricks security scopes
- Secure access with Azure IAM and authentication
- Describe security
- Exercise: Access Azure Storage with key vault-backed secrets
- Knowledge check
- Summary
- Module 9: Build and query a Delta Lake
- Introduction
- Describe the open source Delta Lake
- Exercise: Work with basic Delta Lake functionality
- Describe how Azure Databricks manages Delta Lake
- Exercise: Use the Delta Lake Time Machine and perform optimization
- Knowledge check
- Summary
- Module 10: Process streaming data with Azure Databricks structured streaming
- Introduction
- Describe Azure Databricks structured streaming
- Perform stream processing using structured streaming
- Work with Time Windows
- Process data from Event Hubs with structured streaming
- Knowledge check
- Summary
- Module 11: Describe Azure Databricks Delta Lake architecture
- Introduction
- Describe bronze, silver, and gold architecture
- Perform batch and stream processing
- Knowledge check
- Summary
- Module 12: Create production workloads on Azure Databricks with Azure Data Factory
- Introduction
- Schedule Databricks jobs in a data factory pipeline
- Pass parameters into and out of Databricks jobs in data factory
- Knowledge check
- Summary
- Module 13: Implement CI/CD with Azure DevOps
- Introduction
- Describe CI/CD
- Create a CI/CD process with Azure DevOps
- Knowledge check
- Summary
- Module 14: Integrate Azure Databricks with Azure Synapse
- Introduction
- Integrate with Azure Synapse Analytics
- Knowledge check
- Summary
- Module 15: Describe Azure Databricks best practices
- Introduction
- Understand workspace administration best practices
- List security best practices
- Describe tools and integration best practices
- Explain Databricks runtime best practices
- Understand cluster best practices
- Knowledge check
- Summary
Tags
Related Courses
Introduction to JenkinsLinux Foundation via edX Introduction to Cloud Native, DevOps, Agile, and NoSQL
IBM via edX Learn Azure DevOps CI/CD pipelines
Udemy IBM Full Stack Software Developer
IBM via Coursera DevOps: CI/CD with Jenkins pipelines, Maven, Gradle
Udemy