🤖 AI Summary
The opacity of training data in large language models (LLMs) poses significant privacy leakage and copyright infringement risks.
Method: This paper introduces a novel membership inference attack (MIA) paradigm to determine whether a given text was included in the model’s training set. We propose EM-MIA, a framework that jointly optimizes membership scores and prefix effectiveness scores via an expectation-maximization algorithm for synergistic learning. Additionally, we design OLMoMIA—a synthetically generated, controllably challenging benchmark based on the OLMo model—to rigorously evaluate MIA robustness across difficulty levels.
Contribution/Results: Our method achieves state-of-the-art performance on WikiMIA and demonstrates superior robustness across diverse difficulty settings in OLMoMIA. Crucially, we are the first to identify and empirically validate the fundamental failure mechanism of existing MIA methods under high overlap between member and non-member distributions. This insight enables more reliable data contamination detection and regulatory compliance assessment, offering an interpretable, scalable, and principled tool for LLM auditing.
📝 Abstract
The advancement of large language models has grown parallel to the opacity of their training data. Membership inference attacks (MIAs) aim to determine whether specific data was used to train a model. They offer valuable insights into detecting data contamination and ensuring compliance with privacy and copyright standards. However, MIA for LLMs is challenging due to the massive scale of training data and the inherent ambiguity of membership in texts. Moreover, creating realistic MIA evaluation benchmarks is difficult as training and test data distributions are often unknown. We introduce EM-MIA, a novel membership inference method that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm. Our approach leverages the observation that these scores can improve each other: membership scores help identify effective prefixes for detecting training data, while prefix scores help determine membership. As a result, EM-MIA achieves state-of-the-art results on WikiMIA. To enable comprehensive evaluation, we introduce OLMoMIA, a benchmark built from OLMo resources, which allows controlling task difficulty through varying degrees of overlap between training and test data distributions. Our experiments demonstrate EM-MIA is robust across different scenarios while also revealing fundamental limitations of current MIA approaches when member and non-member distributions are nearly identical.