Building Robust Data Pipelines for Modern Data Engineering - End-to-End Project
Offered By: CodeWithYu via YouTube
Course Description
Overview
Embark on a comprehensive end-to-end data engineering project in this nearly two-hour video tutorial. Learn to build robust data pipelines using Apache Spark, Azure Databricks, and Data Build Tool (DBT) with Azure as the cloud provider. Follow along as the instructor guides you through data ingestion into a lakehouse, data integration with Azure Data Factory, and data transformation using Databricks and DBT. Gain hands-on experience setting up resource groups, implementing medallion architecture, configuring Azure Key Vault for secure secret management, and orchestrating data pipelines. Explore the integration of Azure Databricks with Key Vault and Data Factory, and dive into DBT setup, configuration, and advanced features like snapshots and data marts. By the end of this tutorial, you'll have a solid understanding of modern data engineering practices and be equipped to build scalable, efficient data pipelines in the cloud.
Syllabus
Introduction
System Architecture
Creating resource groups on Azure
Setting up the medallion architecture storage account
Setting up Azure Data Factory
Azure Key Vault setup for secrets
Azure database with automatic data population
Azure Data Factory pipeline orchestration
Setting up Databricks
Azure Databricks Secret Scope and Key Vault
Verifying Databricks - Key Vault - Secret Scope Integration
Azure Data Factory - Databricks Integration
DBT Setup
DBT Configuration with Azure Databricks
DBT Snapshots with Azure Databricks and ADLS Gen2
DBT Data Marts with Azure Databricks and ADLS Gen2
DBT Documentation
Outro
Taught by
CodeWithYu
Related Courses
Azure Data Engineer con Databricks y Azure Data FactoryCoursera Project Network via Coursera Operationalizing Microsoft Azure AI Solutions
Pluralsight Building Your First ETL Pipeline Using Azure Databricks
Pluralsight Implementing an Azure Databricks Environment in Microsoft Azure
Pluralsight Building Batch Data Processing Solutions in Microsoft Azure
Pluralsight