Automatic Calibration for Membership Inference Attack on Large Language Models

📅 2025-05-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing membership inference attacks (MIAs) for identifying pretraining data in large language models (LLMs) suffer from high false-positive rates and heavily rely on reference model calibration, limiting practical applicability. To address this, we propose ACMIA—a reference-free, automatically calibrated MIA framework. ACMIA introduces, for the first time, a temperature-tunable probabilistic calibration mechanism grounded in the theoretical foundation of maximum likelihood estimation during LLM pretraining, markedly enhancing the separability between member and non-member output probabilities. It supports black-box, gray-box, and white-box query settings, improving robustness and generalizability. Furthermore, ACMIA employs a multi-granularity discrimination strategy. Evaluated across diverse LLMs—including LLaMA, Qwen, and Phi—ACMIA consistently outperforms state-of-the-art methods on three standard benchmarks, reducing average false-positive rate by 32.7% while significantly improving accuracy and stability. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Membership Inference Attacks (MIAs) have recently been employed to determine whether a specific text was part of the pre-training data of Large Language Models (LLMs). However, existing methods often misinfer non-members as members, leading to a high false positive rate, or depend on additional reference models for probability calibration, which limits their practicality. To overcome these challenges, we introduce a novel framework called Automatic Calibration Membership Inference Attack (ACMIA), which utilizes a tunable temperature to calibrate output probabilities effectively. This approach is inspired by our theoretical insights into maximum likelihood estimation during the pre-training of LLMs. We introduce ACMIA in three configurations designed to accommodate different levels of model access and increase the probability gap between members and non-members, improving the reliability and robustness of membership inference. Extensive experiments on various open-source LLMs demonstrate that our proposed attack is highly effective, robust, and generalizable, surpassing state-of-the-art baselines across three widely used benchmarks. Our code is available at: href{https://github.com/Salehzz/ACMIA}{ extcolor{blue}{Github}}.
Problem

Research questions and friction points this paper is trying to address.

Reducing false positives in membership inference attacks on LLMs
Eliminating dependency on reference models for probability calibration
Improving reliability of membership inference without extra resources
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses tunable temperature for probability calibration
Three configurations for different model access levels
Improves reliability and robustness of membership inference