Leveraging Apache Spark and Delta Lake for Efficient Data Encryption at Scale
Offered By: Databricks via YouTube
Course Description
Overview
Explore an innovative approach to data privacy and security in this 25-minute conference talk from Databricks. Learn how Mars Petcare's data engineering team developed Gecko, an efficient CCPA compliance ecosystem designed for Apache Spark and Delta Lake. Discover how Gecko automates consumer deletion requests, enhances PII data security, maintains non-PII data integrity, and ensures accessibility of PII data when needed. Understand the implementation of row-level encryption for PII tables and the strategic storage of encryption keys. Gain insights into leveraging Spark and Delta Lake for large-scale data encryption, automated privacy rights requests, and enhanced platform security. Explore the potential for using the generated labeled dataset in developing machine learning models for automatic PII detection. Delve into the technical aspects, benefits, and future possibilities of this data privacy solution, tailored for organizations facing challenges in consumer data privacy compliance.
Syllabus
Intro
Agenda
Authors
The Petcare Data Platform
Our Mission
Gecko Ecosystem
Key Generation
Data Encryption
Optimizing Parquet Encryption
Master Table Generation
Gecko Delete
Benefits
Future Work
Taught by
Databricks
Related Courses
Introduction to Data Analytics for BusinessUniversity of Colorado Boulder via Coursera Digital and the Everyday: from codes to cloud
NPTEL via Swayam Systems and Application Security
(ISC)² via Coursera Protecting Health Data in the Modern Age: Getting to Grips with the GDPR
University of Groningen via FutureLearn Teaching Impacts of Technology: Data Collection, Use, and Privacy
University of California, San Diego via Coursera