The Tail Tells All: Estimating Model-Level Membership Inference Vulnerability Without Reference Models

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Membership inference attacks (MIAs) commonly rely on numerous reference models for vulnerability assessment, incurring prohibitive computational overhead. Method: This paper proposes the first reference-model-free, model-level privacy vulnerability assessment framework. It leverages structural disparities between training and test loss distributions of the target model: vulnerable samples exhibit a shift from the high-loss tail to the low-loss head of the training loss distribution, resulting in the absence of high-loss outliers—a reliable indicator of privacy risk. The method builds upon the true negative rate (TNR) of simple loss-based attacks and incorporates a nonlinear transformation to adapt effectively to large language models. Results: Extensive evaluation across diverse architectures and datasets demonstrates that the approach accurately estimates model vulnerability to state-of-the-art MIAs (e.g., LiRA), significantly outperforming low-overhead baselines (e.g., RMIA) and conventional distributional divergence metrics.

Technology Category

Application Category

📝 Abstract

Membership inference attacks (MIAs) have emerged as the standard tool for evaluating the privacy risks of AI models. However, state-of-the-art attacks require training numerous, often computationally expensive, reference models, limiting their practicality. We present a novel approach for estimating model-level vulnerability, the TPR at low FPR, to membership inference attacks without requiring reference models. Empirical analysis shows loss distributions to be asymmetric and heavy-tailed and suggests that most points at risk from MIAs have moved from the tail (high-loss region) to the head (low-loss region) of the distribution after training. We leverage this insight to propose a method to estimate model-level vulnerability from the training and testing distribution alone: using the absence of outliers from the high-loss region as a predictor of the risk. We evaluate our method, the TNR of a simple loss attack, across a wide range of architectures and datasets and show it to accurately estimate model-level vulnerability to the SOTA MIA attack (LiRA). We also show our method to outperform both low-cost (few reference models) attacks such as RMIA and other measures of distribution difference. We finally evaluate the use of non-linear functions to evaluate risk and show the approach to be promising to evaluate the risk in large-language models.

Problem

Research questions and friction points this paper is trying to address.

Estimating membership inference vulnerability without training reference models

Analyzing asymmetric loss distributions to identify privacy risks

Predicting model vulnerability using absence of high-loss outliers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Estimates MIA vulnerability without reference models

Uses loss distribution asymmetry and tail analysis

Predicts risk from absence of high-loss outliers

🔎 Similar Papers

No similar papers found.