Noise-Aware Statistical Inference with Differentially Private Synthetic Data
Offered By: Finnish Center for Artificial Intelligence FCAI via YouTube
Course Description
Overview
Explore the challenges and solutions in analyzing differentially private synthetic data in this 27-minute conference talk by Ossi Räisä from the Finnish Center for Artificial Intelligence. Delve into the innovative Noise-Aware Multiple Imputation (NA+MI) pipeline, which combines synthetic data analysis techniques from multiple imputation with noise-aware Bayesian modeling. Discover how this approach addresses the issue of invalid inferences when analyzing differentially private synthetic data as if it were real. Learn about the novel NAPSU-MQ algorithm for discrete data generation using marginal queries, based on the principle of maximum entropy. Examine experimental results demonstrating the pipeline's ability to produce accurate confidence intervals from differentially private synthetic data, accounting for additional uncertainty from privacy noise. Gain insights into the limitations of this approach and its potential implications for privacy-preserving machine learning.
Syllabus
Intro
Privacy-preserving ML @ FCAI
Introduction: Synthetic Data
Introduction: Differential Privacy
Introduction: Analysing Synthetic Data
Background: Differential Privacy
The Solution: Noise-Aware Multiple Imputation (NA+MI)
Rubin's Rules
The Bayesian Model - Variables
Results - Toy Example
Results - UCI Adult Dataset
Results - Marginal Accuracy on Adult
Limitations
Conclusion
Taught by
Finnish Center for Artificial Intelligence FCAI
Related Courses
Statistics OnePrinceton University via Coursera Intro to Statistics
Stanford University via Udacity Mathematical Biostatistics Boot Camp 1
Johns Hopkins University via Coursera Statistics: Making Sense of Data
University of Toronto via Coursera Case-Based Introduction to Biostatistics
Johns Hopkins University via Coursera