Lost in the Averages: A New Specific Setup to Evaluate Membership Inference Attacks Against Machine Learning Models

📅 2024-05-24

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

233K/year

🤖 AI Summary

Existing membership inference attack (MIA) evaluations average privacy risk across datasets, ignoring individual record-level risk and thereby severely distorting risk estimates for specific models or synthetic data releases. This problem arises from confounding multiple random sources—particularly dataset variability and weight initialization—in current evaluation protocols. We identify this as a fundamental methodological flaw and establish, for the first time, a rigorous evaluation paradigm wherein weight initialization is the sole source of randomness. To address it, we propose a target-data-aware strong adversary model, integrating theoretical risk decomposition, controlled-variable experiments, and state-of-the-art MIA methods. Our empirical analysis reveals that standard evaluations systematically underestimate risk for high-risk samples. In contrast, our framework substantially improves the precision of individual-level privacy risk quantification, and incorporating target-data priors significantly boosts attack success rates.

Technology Category

Application Category

📝 Abstract

Membership Inference Attacks (MIAs) are widely used to evaluate the propensity of a machine learning (ML) model to memorize an individual record and the privacy risk releasing the model poses. MIAs are commonly evaluated similarly to ML models: the MIA is performed on a test set of models trained on datasets unseen during training, which are sampled from a larger pool, $D_{eval}$. The MIA is evaluated across all datasets in this test set, and is thus evaluated across the distribution of samples from $D_{eval}$. While this was a natural extension of ML evaluation to MIAs, recent work has shown that a record's risk heavily depends on its specific dataset. For example, outliers are particularly vulnerable, yet an outlier in one dataset may not be one in another. The sources of randomness currently used to evaluate MIAs may thus lead to inaccurate individual privacy risk estimates. We propose a new, specific evaluation setup for MIAs against ML models, using weight initialization as the sole source of randomness. This allows us to accurately evaluate the risk associated with the release of a model trained on a specific dataset. Using SOTA MIAs, we empirically show that the risk estimates given by the current setup lead to many records being misclassified as low risk. We derive theoretical results which, combined with empirical evidence, suggest that the risk calculated in the current setup is an average of the risks specific to each sampled dataset, validating our use of weight initialization as the only source of randomness. Finally, we consider an MIA with a stronger adversary leveraging information about the target dataset to infer membership. Taken together, our results show that current MIA evaluation is averaging the risk across datasets leading to inaccurate risk estimates, and the risk posed by attacks leveraging information about the target dataset to be potentially underestimated.

Problem

Research questions and friction points this paper is trying to address.

Evaluating privacy risks of machine learning models through membership inference attacks

Analyzing limitations of traditional dataset-averaged privacy risk assessments

Proposing model-specific evaluation method for accurate privacy leakage estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes model-seeded game for specific risk estimation

Uses leave-one-out approach to audit privacy guarantees

Evaluates membership inference attacks on synthetic datasets

🔎 Similar Papers

Fundamental Limits of Membership Inference Attacks on Machine Learning Models