Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models

📅 2025-05-24

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Existing membership inference attacks (MIAs) scale poorly to large language models (LLMs), constraining prior work to weak attackers or small-scale settings—making it unclear whether observed limitations stem from methodological weaknesses or inherent LLM robustness. Method: We extend the state-of-the-art MIA LiRA to GPT-2 variants (10M–1B parameters), training massive reference models on over 20 billion tokens from C4 to enable realistic pretraining-scale evaluation. Results: We establish the first empirical benchmark for MIA against pretrained LLMs, demonstrating that LiRA is feasible but exhibits limited efficacy (AUC < 0.7) on real-world LLMs. Crucially, its success shows no simple correlation with conventional privacy metrics—challenging prevailing assumptions about privacy–utility trade-offs and LLM vulnerability. This work bridges a critical gap in LLM privacy assessment by providing scalable, empirically grounded methodology and baseline results.

Technology Category

Application Category

📝 Abstract

State-of-the-art membership inference attacks (MIAs) typically require training many reference models, making it difficult to scale these attacks to large pre-trained language models (LLMs). As a result, prior research has either relied on weaker attacks that avoid training reference models (e.g., fine-tuning attacks), or on stronger attacks applied to small-scale models and datasets. However, weaker attacks have been shown to be brittle - achieving close-to-arbitrary success - and insights from strong attacks in simplified settings do not translate to today's LLMs. These challenges have prompted an important question: are the limitations observed in prior work due to attack design choices, or are MIAs fundamentally ineffective on LLMs? We address this question by scaling LiRA - one of the strongest MIAs - to GPT-2 architectures ranging from 10M to 1B parameters, training reference models on over 20B tokens from the C4 dataset. Our results advance the understanding of MIAs on LLMs in three key ways: (1) strong MIAs can succeed on pre-trained LLMs; (2) their effectiveness, however, remains limited (e.g., AUC<0.7) in practical settings; and, (3) the relationship between MIA success and related privacy metrics is not as straightforward as prior work has suggested.

Problem

Research questions and friction points this paper is trying to address.

Scaling membership inference attacks to large language models

Evaluating effectiveness of strong attacks on pre-trained LLMs

Understanding relationship between MIA success and privacy metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Scaling LiRA to GPT-2 architectures

Training reference models on 20B tokens

Evaluating MIA success on large datasets

🔎 Similar Papers

Context-Aware Membership Inference Attacks against Pre-trained Large Language Models