🤖 AI Summary
Membership inference attacks (MIAs) against pre-trained large language models (LLMs) face a fundamental challenge: existing classification-based approaches ignore the generative nature of LLMs and fail to capture their sequence-level memorization behavior. Method: We propose a context-aware MIA framework that, for the first time, applies statistical hypothesis testing to analyze token-level perplexity evolution trajectories—leveraging subsequence perplexity modeling, context-sensitive memory representation, and contrastive loss dynamics analysis. Contribution/Results: Our method breaks from conventional transfer-based attack paradigms and achieves significant improvements over state-of-the-art loss-based MIAs across multiple mainstream pre-trained LLMs, markedly enhancing attack accuracy. It reveals strong context-dependent heterogeneity in pre-training data memorization, offering novel insights into LLM memorization mechanisms and associated privacy risks, along with an effective analytical tool for rigorous privacy evaluation.
📝 Abstract
Prior Membership Inference Attacks (MIAs) on pre-trained Large Language Models (LLMs), adapted from classification model attacks, fail due to ignoring the generative process of LLMs across token sequences. In this paper, we present a novel attack that adapts MIA statistical tests to the perplexity dynamics of subsequences within a data point. Our method significantly outperforms prior loss-based approaches, revealing context-dependent memorization patterns in pre-trained LLMs.