🤖 AI Summary
Existing membership inference attacks (MIAs) for identifying pretraining data in large language models (LLMs) suffer from high false-positive rates and heavily rely on reference model calibration, limiting practical applicability. To address this, we propose ACMIA—a reference-free, automatically calibrated MIA framework. ACMIA introduces, for the first time, a temperature-tunable probabilistic calibration mechanism grounded in the theoretical foundation of maximum likelihood estimation during LLM pretraining, markedly enhancing the separability between member and non-member output probabilities. It supports black-box, gray-box, and white-box query settings, improving robustness and generalizability. Furthermore, ACMIA employs a multi-granularity discrimination strategy. Evaluated across diverse LLMs—including LLaMA, Qwen, and Phi—ACMIA consistently outperforms state-of-the-art methods on three standard benchmarks, reducing average false-positive rate by 32.7% while significantly improving accuracy and stability. The implementation is publicly available.
📝 Abstract
Membership Inference Attacks (MIAs) have recently been employed to determine whether a specific text was part of the pre-training data of Large Language Models (LLMs). However, existing methods often misinfer non-members as members, leading to a high false positive rate, or depend on additional reference models for probability calibration, which limits their practicality. To overcome these challenges, we introduce a novel framework called Automatic Calibration Membership Inference Attack (ACMIA), which utilizes a tunable temperature to calibrate output probabilities effectively. This approach is inspired by our theoretical insights into maximum likelihood estimation during the pre-training of LLMs. We introduce ACMIA in three configurations designed to accommodate different levels of model access and increase the probability gap between members and non-members, improving the reliability and robustness of membership inference. Extensive experiments on various open-source LLMs demonstrate that our proposed attack is highly effective, robust, and generalizable, surpassing state-of-the-art baselines across three widely used benchmarks. Our code is available at: href{https://github.com/Salehzz/ACMIA}{ extcolor{blue}{Github}}.