Using Apache Spark and Differential Privacy for 2020 Census Data Protection
Offered By: Databricks via YouTube
Course Description
Overview
Syllabus
Intro
Abstract
Outline
Privacy and the Decennial Census
2010 Census: Summary of Publications (approximate counts)
We performed a database reconstruct and re-identification attack for all 308.745538 people in the 2010 Census
The basic idea of differential privacy: Uncertainty (noise) protects privacy
The Census Bureau is using differential privacy for the 2020 Census.
How much noise do we add? That's a policy decision.
We planned to create a Disclosure Avoidance System that dropped into the Census production system.
The Disclosure Avoidance System allows the Census Bureau to enforce global confidentiality protections
Our DP mechanism protects histograms of person types. Census "block"
Running the block-by-block algorithm with spark
In 2018 we invented the TopDown Algorithm (TDA)
Key challenges in monitoring spark
We created our own monitoring framework
Cluster List
Each DAS run is a "mission"
Mission Report
System Load
Free Memory
In Summary
Taught by
Databricks
Related Courses
Introduction to Data Analytics for BusinessUniversity of Colorado Boulder via Coursera Digital and the Everyday: from codes to cloud
NPTEL via Swayam Systems and Application Security
(ISC)² via Coursera Protecting Health Data in the Modern Age: Getting to Grips with the GDPR
University of Groningen via FutureLearn Teaching Impacts of Technology: Data Collection, Use, and Privacy
University of California, San Diego via Coursera