🤖 AI Summary
Existing membership inference attacks against code completion models rely on surrogate models or hand-crafted heuristics, failing to capture the subtle memorization patterns inherent in over-parameterized models.
Method: We propose the first adversarial prompt-driven membership inference framework tailored for code completion. It automatically generates semantically preserving and syntactically valid adversarial prompts to elicit distinguishable completion behaviors between training and non-training samples; features are extracted directly from actual completions to train a lightweight classifier—requiring no surrogate model or predefined rules.
Contribution/Results: The method exhibits strong cross-model transferability. Evaluated on mainstream models including Code Llama 7B, it achieves up to a 102% improvement in AUC over state-of-the-art methods. Extensive experiments on APPS and HumanEval benchmarks confirm its effectiveness and generalizability across diverse code generation tasks.
📝 Abstract
Membership inference attacks (MIAs) on code completion models offer an effective way to assess privacy risks by inferring whether a given code snippet was part of the training data. Existing black- and gray-box MIAs rely on expensive surrogate models or manually crafted heuristic rules, which limit their ability to capture the nuanced memorization patterns exhibited by over-parameterized code language models. To address these challenges, we propose AdvPrompt-MIA, a method specifically designed for code completion models, combining code-specific adversarial perturbations with deep learning. The core novelty of our method lies in designing a series of adversarial prompts that induce variations in the victim code model's output. By comparing these outputs with the ground-truth completion, we construct feature vectors to train a classifier that automatically distinguishes member from non-member samples. This design allows our method to capture richer memorization patterns and accurately infer training set membership. We conduct comprehensive evaluations on widely adopted models, such as Code Llama 7B, over the APPS and HumanEval benchmarks. The results show that our approach consistently outperforms state-of-the-art baselines, with AUC gains of up to 102%. In addition, our method exhibits strong transferability across different models and datasets, underscoring its practical utility and generalizability.