YoVDO

Data Engineering with Databricks

Offered By: Pragmatic AI Labs via edX

Tags

Databricks Courses Data Visualization Courses SQL Courses Apache Spark Courses Data Transformation Courses Data Processing Courses Data Engineering Courses Cluster Management Courses Delta Lake Courses ETL Pipelines Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Master Data Engineering on Databricks Lakehouse Platform

  • Learn Databricks architecture, cluster management & notebook analysis
  • Build reliable ETL pipelines with Delta Lake for data transformation
  • Implement advanced data processing techniques with Apache Spark

Course Highlights:

  • Create & scale Databricks clusters for workloads
  • Load data from diverse sources into notebooks
  • Explore, visualize & profile datasets with notebooks
  • Version control & share notebooks via Git integration
  • Read & ingest data in various file formats
  • Transform data with SQL & DataFrame operations
  • Handle complex data types like arrays, structs, timestamps
  • Deduplicate, join & flatten nested data structures
  • Identify & fix data quality issues with UDFs
  • Load cleansed data into Delta Lake for reliability
  • Build production-ready pipelines with Delta Live Tables
  • Schedule & monitor workloads using Databricks Jobs
  • Secure data access with Unity Catalog

Gain comprehensive skills in data engineering on Databricks through hands-on labs, real-world projects and best practices for the modern data lakehouse.


Syllabus

Module 1: Databricks Lakehouse Platform Fundamentals

  • Introduction to the Databricks Lakehouse Platform and its architecture

  • Creating, managing, and configuring clusters

  • Setting up and using Databricks with IntelliJ, RStudio, and the Databricks CLI

  • Introduction to notebooks, including execution, sharing, and multi-language support

  • Efficient data transformation with Spark SQL and the Catalog Explorer

  • Creating tables from files and querying external data sources

  • Reliable data pipelines with Delta Lake, ACID transactions, and Z-Ordering optimization

Module 2: Data Transformation and Pipelines

  • Automated pipelines with Delta Live Tables

    • Delta Live Tables components

    • Continuous vs triggered pipelines

    • Configuring Auto Loader

    • Querying pipeline events

    • End-to-end example of Delta Live

    • Vacuum and garbage collection

  • Orchestrating workloads with Databricks Jobs

    • Multi-task workflows and task dependencies

    • Viewing job history

    • Using dashboards

    • Handling failures and configuring retries

  • Unified data access with Unity Catalog

    • Catalogs vs metastores

    • Unity Catalog quickstart in Python

    • Applying object security

    • Best practices for catalogs, connections, and business units


Taught by

Noah Gift and Alfredo Deza

Related Courses

Achieving Advanced Insights with BigQuery - Français
Google Cloud via Coursera
Database Administration and SQL Language Basics
A Cloud Guru
SQL Deep Dive
A Cloud Guru
Using Python for Data Management and Reporting
A Cloud Guru
نمذجة البيانات المتقدمة
Meta via Coursera