YoVDO

Data Engineering with Databricks

Offered By: Pragmatic AI Labs via edX

Tags

Databricks Courses Data Visualization Courses SQL Courses Apache Spark Courses Data Transformation Courses Data Processing Courses Data Engineering Courses Cluster Management Courses Delta Lake Courses ETL Pipelines Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Master Data Engineering on Databricks Lakehouse Platform

  • Learn Databricks architecture, cluster management & notebook analysis
  • Build reliable ETL pipelines with Delta Lake for data transformation
  • Implement advanced data processing techniques with Apache Spark

Course Highlights:

  • Create & scale Databricks clusters for workloads
  • Load data from diverse sources into notebooks
  • Explore, visualize & profile datasets with notebooks
  • Version control & share notebooks via Git integration
  • Read & ingest data in various file formats
  • Transform data with SQL & DataFrame operations
  • Handle complex data types like arrays, structs, timestamps
  • Deduplicate, join & flatten nested data structures
  • Identify & fix data quality issues with UDFs
  • Load cleansed data into Delta Lake for reliability
  • Build production-ready pipelines with Delta Live Tables
  • Schedule & monitor workloads using Databricks Jobs
  • Secure data access with Unity Catalog

Gain comprehensive skills in data engineering on Databricks through hands-on labs, real-world projects and best practices for the modern data lakehouse.


Syllabus

Module 1: Databricks Lakehouse Platform Fundamentals

  • Introduction to the Databricks Lakehouse Platform and its architecture

  • Creating, managing, and configuring clusters

  • Setting up and using Databricks with IntelliJ, RStudio, and the Databricks CLI

  • Introduction to notebooks, including execution, sharing, and multi-language support

  • Efficient data transformation with Spark SQL and the Catalog Explorer

  • Creating tables from files and querying external data sources

  • Reliable data pipelines with Delta Lake, ACID transactions, and Z-Ordering optimization

Module 2: Data Transformation and Pipelines

  • Automated pipelines with Delta Live Tables

    • Delta Live Tables components

    • Continuous vs triggered pipelines

    • Configuring Auto Loader

    • Querying pipeline events

    • End-to-end example of Delta Live

    • Vacuum and garbage collection

  • Orchestrating workloads with Databricks Jobs

    • Multi-task workflows and task dependencies

    • Viewing job history

    • Using dashboards

    • Handling failures and configuring retries

  • Unified data access with Unity Catalog

    • Catalogs vs metastores

    • Unity Catalog quickstart in Python

    • Applying object security

    • Best practices for catalogs, connections, and business units


Taught by

Noah Gift and Alfredo Deza

Related Courses

Big Data Essentials
A Cloud Guru
Big Data
University of Adelaide via edX
Advanced Data Science with IBM
IBM via Coursera
Amazon EMR Getting Started (Indonesian)
Amazon Web Services via AWS Skill Builder
Analisar e preparar dados com o Amazon SageMaker Data Wrangler e o Amazon EMR (Português (Brasil)) | Lab - Analyze and Prepare Data with Amazon SageMaker Data Wrangler and Amazon EMR (Portuguese (Brazil))
Amazon Web Services via AWS Skill Builder