YoVDO

Leveraging Apache Spark and Delta Lake for Efficient Data Encryption at Scale

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Data Engineering Courses Data Security Courses Data Privacy Courses Data Encryption Courses Delta Lake Courses

Course Description

Overview

Explore an innovative approach to data privacy and security in this 25-minute conference talk from Databricks. Learn how Mars Petcare's data engineering team developed Gecko, an efficient CCPA compliance ecosystem designed for Apache Spark and Delta Lake. Discover how Gecko automates consumer deletion requests, enhances PII data security, maintains non-PII data integrity, and ensures accessibility of PII data when needed. Understand the implementation of row-level encryption for PII tables and the strategic storage of encryption keys. Gain insights into leveraging Spark and Delta Lake for large-scale data encryption, automated privacy rights requests, and enhanced platform security. Explore the potential for using the generated labeled dataset in developing machine learning models for automatic PII detection. Delve into the technical aspects, benefits, and future possibilities of this data privacy solution, tailored for organizations facing challenges in consumer data privacy compliance.

Syllabus

Intro
Agenda
Authors
The Petcare Data Platform
Our Mission
Gecko Ecosystem
Key Generation
Data Encryption
Optimizing Parquet Encryption
Master Table Generation
Gecko Delete
Benefits
Future Work


Taught by

Databricks

Related Courses

Distributed Computing with Spark SQL
University of California, Davis via Coursera
Apache Spark (TM) SQL for Data Analysts
Databricks via Coursera
Building Your First ETL Pipeline Using Azure Databricks
Pluralsight
Implement a data lakehouse analytics solution with Azure Databricks
Microsoft via Microsoft Learn
Perform data science with Azure Databricks
Microsoft via Microsoft Learn