Causal Inference with Differentially Private (Clustered) Outcomes

📅 2023-08-02

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

265K/year

🤖 AI Summary

This paper addresses the fundamental trade-off between privacy protection and estimation accuracy in causal effect estimation under differential privacy. We propose Cluster-DP, the first mechanism to incorporate inherent data clustering structure into the label differential privacy (LDP) framework. By defining a cluster quality metric, we theoretically establish that Cluster-DP jointly controls privacy loss and estimator variance, subsuming non-clustering and uniform-prior baselines as special cases. Under cluster-aware noise injection, Cluster-DP significantly reduces the variance of average treatment effect (ATE) estimation. We derive a tight theoretical upper bound on estimation error. Empirical evaluation demonstrates that, under identical privacy budgets, Cluster-DP achieves substantially higher estimation accuracy than standard LDP and uniform-prior mechanisms—particularly on datasets exhibiting high intra-cluster homogeneity.

📝 Abstract

Estimating causal effects from randomized experiments is only feasible if participants agree to reveal their potentially sensitive responses. Of the many ways of ensuring privacy, label differential privacy is a widely used measure of an algorithm's privacy guarantee, which might encourage participants to share responses without running the risk of de-anonymization. Many differentially private mechanisms inject noise into the original data-set to achieve this privacy guarantee, which increases the variance of most statistical estimators and makes the precise measurement of causal effects difficult: there exists a fundamental privacy-variance trade-off to performing causal analyses from differentially private data. With the aim of achieving lower variance for stronger privacy guarantees, we suggest a new differential privacy mechanism, Cluster-DP, which leverages any given cluster structure of the data while still allowing for the estimation of causal effects. We show that, depending on an intuitive measure of cluster quality, we can improve the variance loss while maintaining our privacy guarantees. We compare its performance, theoretically and empirically, to that of its unclustered version and a more extreme uniform-prior version which does not use any of the original response distribution, both of which are special cases of the Cluster-DP algorithm.

Problem

Research questions and friction points this paper is trying to address.

Balancing privacy and variance in differentially private experiments

Improving causal effect estimation precision with clustered data

Reducing variance penalty in private data without compromising privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses clustering to enhance privacy-variance trade-off

Introduces quality metric for better cluster selection

Evaluates Cluster-DP on real and simulated data

🔎 Similar Papers

No similar papers found.