Evaluating Membership Inference Attacks in heterogeneous-data setups

📅 2025-02-26

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This work addresses the evaluation of membership inference attacks (MIAs) under distribution heterogeneity between target models and attacker data—challenging the conventional i.i.d. assumption. We propose the first continuous heterogeneity quantification metric for tabular data, grounded in statistical distance measures to characterize distributional shift, and design a dual-path heterogeneous simulation framework encompassing both synthetic and real-world distribution shifts. Using this framework, we systematically evaluate standard MIAs—including shadow training coupled with logistic regression—under controlled heterogeneity. Experiments reveal that MIA accuracy drops sharply from 90% under homogeneous conditions to near-random levels (50%) under heterogeneity, with attack failure strongly dependent on the specific shift pattern. Our contributions include a reproducible, quantitative benchmark and methodology for privacy risk assessment in realistic, non-i.i.d. machine learning settings.

Technology Category

Application Category

📝 Abstract

Among all privacy attacks against Machine Learning (ML), membership inference attacks (MIA) attracted the most attention. In these attacks, the attacker is given an ML model and a data point, and they must infer whether the data point was used for training. The attacker also has an auxiliary dataset to tune their inference algorithm. Attack papers commonly simulate setups in which the attacker's and the target's datasets are sampled from the same distribution. This setting is convenient to perform experiments, but it rarely holds in practice. ML literature commonly starts with similar simplifying assumptions (i.e.,"i.i.d."datasets), and later generalizes the results to support heterogeneous data distributions. Similarly, our work makes a first step in the generalization of the MIA evaluation to heterogeneous data. First, we design a metric to measure the heterogeneity between any pair of tabular data distributions. This metric provides a continuous scale to analyze the phenomenon. Second, we compare two methodologies to simulate a data heterogeneity between the target and the attacker. These setups provide opposite performances: 90% attack accuracy vs. 50% (i.e., random guessing). Our results show that the MIA accuracy depends on the experimental setup; and even if research on MIA considers heterogeneous data setups, we have no standardized baseline of how to simulate it. The lack of such a baseline for MIA experiments poses a significant challenge to risk assessments in real-world machine learning scenarios.

Problem

Research questions and friction points this paper is trying to address.

Membership inference attacks in heterogeneous data

Metric for measuring data distribution heterogeneity

Simulation methodologies for data heterogeneity impact

Innovation

Methods, ideas, or system contributions that make the work stand out.

Develops metric for data heterogeneity

Compares methodologies for data simulation

Highlights MIA accuracy setup dependency

🔎 Similar Papers

No similar papers found.