Estimation of conditional average treatment effects on distributed confidential data

📅 2024-02-05

📈 Citations: 0

✨ Influential: 0

career value

252K/year

🤖 AI Summary

To address the challenge of accurately estimating conditional average treatment effects (CATE) under distributed sensitive data—where centralized access to raw data is infeasible—this paper proposes the first privacy-preserving collaborative double machine learning (DML) framework. Our method integrates differential privacy with semiparametric modeling to enable non-iterative, communication-efficient CATE estimation and statistical inference without sharing raw data or intermediate gradients. A key innovation is the construction of an accumulative, cross-temporal and cross-institutional knowledge base that supports continual model evolution. Experiments on synthetic, semi-synthetic, and real-world healthcare datasets demonstrate that our approach matches or exceeds the estimation accuracy of existing methods, significantly improves robustness to model misspecification, and achieves superior privacy–utility trade-offs under constrained privacy budgets.

Technology Category

Application Category

📝 Abstract

Estimation of conditional average treatment effects (CATEs) is an important topic in sciences. CATEs can be estimated with high accuracy if distributed data across multiple parties can be centralized. However, it is difficult to aggregate such data owing to confidential or privacy concerns. To address this issue, we proposed data collaboration double machine learning, a method that can estimate CATE models from privacy-preserving fusion data constructed from distributed data, and evaluated our method through simulations. Our contributions are summarized in the following three points. First, our method enables estimation and testing of semi-parametric CATE models without iterative communication on distributed data. Our semi-parametric CATE method enable estimation and testing that is more robust to model mis-specification than parametric methods. Second, our method enables collaborative estimation between multiple time points and different parties through the accumulation of a knowledge base. Third, our method performed equally or better than other methods in simulations using synthetic, semi-synthetic and real-world datasets.

Problem

Research questions and friction points this paper is trying to address.

Estimating CATEs on distributed confidential data

Privacy-preserving fusion for distributed CATE estimation

Robust semi-parametric CATE modeling without data centralization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Privacy-preserving fusion data for CATE estimation

Non-iterative semi-parametric model communication

Knowledge base accumulation across time and parties

🔎 Similar Papers

No similar papers found.