Membership Inference Attacks from Causal Principles

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work addresses the statistical inadequacy of existing membership inference attack evaluation methods—such as one-run and zero-run—which fail to accurately quantify a model’s memorization of training data and associated privacy risks. For the first time, the problem is formally framed as a causal inference task, where data memorization is defined as the causal effect of a sample’s inclusion in the training set. The analysis systematically uncovers interference bias in one-run evaluations and confounding bias in zero-run settings. Building on this insight, the paper introduces a novel evaluation metric and estimator with non-asymptotic consistency guarantees. Empirical validation on real-world datasets demonstrates their reliability under practical constraints, including non-reproducible training and distribution shifts, thereby establishing a rigorous theoretical foundation for privacy assessment in AI systems.

Technology Category

Application Category

📝 Abstract

Membership Inference Attacks (MIAs) are widely used to quantify training data memorization and assess privacy risks. Standard evaluation requires repeated retraining, which is computationally costly for large models. One-run methods (single training with randomized data inclusion) and zero-run methods (post hoc evaluation) are often used instead, though their statistical validity remains unclear. To address this gap, we frame MIA evaluation as a causal inference problem, defining memorization as the causal effect of including a data point in the training set. This novel formulation reveals and formalizes key sources of bias in existing protocols: one-run methods suffer from interference between jointly included points, while zero-run evaluations popular for LLMs are confounded by non-random membership assignment. We derive causal analogues of standard MIA metrics and propose practical estimators for multi-run, one-run, and zero-run regimes with non-asymptotic consistency guarantees. Experiments on real-world data show that our approach enables reliable memorization measurement even when retraining is impractical and under distribution shift, providing a principled foundation for privacy evaluation in modern AI systems.

Problem

Research questions and friction points this paper is trying to address.

Membership Inference Attacks

Causal Inference

Training Data Memorization

Privacy Risk

Statistical Validity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal Inference

Membership Inference Attacks

Memorization