(Token-Level) extbf{InfoRMIA}: Stronger Membership Inference and Memorization Assessment for LLMs

📅 2025-10-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the coarse-grained and unlocalizable nature of sensitive information memorization and leakage assessment in large language model (LLM) training. We propose InfoRMIA, an information-theoretic robust membership inference attack. It is the first method to enable token-level membership inference, supporting fine-grained quantification and localization of privacy leakage—from sequences down to individual generated tokens. By modeling information entropy and optimizing gradient robustness, InfoRMIA enhances discriminative power for privacy signals while maintaining computational efficiency. Evaluated across multiple benchmarks, InfoRMIA significantly outperforms existing RMIA methods in sequence-level inference accuracy and—uniquely—enables interpretable identification of memorized tokens. This provides both theoretical grounding and technical foundations for precise privacy risk assessment and targeted unlearning.

Technology Category

Application Category

📝 Abstract
Machine learning models are known to leak sensitive information, as they inevitably memorize (parts of) their training data. More alarmingly, large language models (LLMs) are now trained on nearly all available data, which amplifies the magnitude of information leakage and raises serious privacy risks. Hence, it is more crucial than ever to quantify privacy risk before the release of LLMs. The standard method to quantify privacy is via membership inference attacks, where the state-of-the-art approach is the Robust Membership Inference Attack (RMIA). In this paper, we present InfoRMIA, a principled information-theoretic formulation of membership inference. Our method consistently outperforms RMIA across benchmarks while also offering improved computational efficiency. In the second part of the paper, we identify the limitations of treating sequence-level membership inference as the gold standard for measuring leakage. We propose a new perspective for studying membership and memorization in LLMs: token-level signals and analyses. We show that a simple token-based InfoRMIA can pinpoint which tokens are memorized within generated outputs, thereby localizing leakage from the sequence level down to individual tokens, while achieving stronger sequence-level inference power on LLMs. This new scope rethinks privacy in LLMs and can lead to more targeted mitigation, such as exact unlearning.
Problem

Research questions and friction points this paper is trying to address.

Quantifying privacy risks in large language models
Improving membership inference attacks for data leakage
Localizing memorization from sequences to individual tokens
Innovation

Methods, ideas, or system contributions that make the work stand out.

Token-level membership inference for LLMs
Information-theoretic formulation outperforms RMIA
Pinpoints memorized tokens for targeted mitigation
🔎 Similar Papers
No similar papers found.