Data Engineering with Databricks

Offered By: Pragmatic AI Labs via edX

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Master Data Engineering on Databricks Lakehouse Platform

Learn Databricks architecture, cluster management & notebook analysis
Build reliable ETL pipelines with Delta Lake for data transformation
Implement advanced data processing techniques with Apache Spark

Course Highlights:

Create & scale Databricks clusters for workloads
Load data from diverse sources into notebooks
Explore, visualize & profile datasets with notebooks
Version control & share notebooks via Git integration
Read & ingest data in various file formats
Transform data with SQL & DataFrame operations
Handle complex data types like arrays, structs, timestamps
Deduplicate, join & flatten nested data structures
Identify & fix data quality issues with UDFs
Load cleansed data into Delta Lake for reliability
Build production-ready pipelines with Delta Live Tables
Schedule & monitor workloads using Databricks Jobs
Secure data access with Unity Catalog

Gain comprehensive skills in data engineering on Databricks through hands-on labs, real-world projects and best practices for the modern data lakehouse.

Syllabus

Module 1: Databricks Lakehouse Platform Fundamentals

Introduction to the Databricks Lakehouse Platform and its architecture
Creating, managing, and configuring clusters
Setting up and using Databricks with IntelliJ, RStudio, and the Databricks CLI
Introduction to notebooks, including execution, sharing, and multi-language support
Efficient data transformation with Spark SQL and the Catalog Explorer
Creating tables from files and querying external data sources
Reliable data pipelines with Delta Lake, ACID transactions, and Z-Ordering optimization

Module 2: Data Transformation and Pipelines

Automated pipelines with Delta Live Tables
- Delta Live Tables components
- Continuous vs triggered pipelines
- Configuring Auto Loader
- Querying pipeline events
- End-to-end example of Delta Live
- Vacuum and garbage collection
Orchestrating workloads with Databricks Jobs
- Multi-task workflows and task dependencies
- Viewing job history
- Using dashboards
- Handling failures and configuring retries
Unified data access with Unity Catalog
- Catalogs vs metastores
- Unity Catalog quickstart in Python
- Applying object security
- Best practices for catalogs, connections, and business units

Taught by

Noah Gift and Alfredo Deza

Data Engineering with Databricks

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Data Engineering with Databricks

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue