Effective Code Membership Inference for Code Completion Models via Adversarial Prompts

📅 2025-11-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing membership inference attacks against code completion models rely on surrogate models or hand-crafted heuristics, failing to capture the subtle memorization patterns inherent in over-parameterized models. Method: We propose the first adversarial prompt-driven membership inference framework tailored for code completion. It automatically generates semantically preserving and syntactically valid adversarial prompts to elicit distinguishable completion behaviors between training and non-training samples; features are extracted directly from actual completions to train a lightweight classifier—requiring no surrogate model or predefined rules. Contribution/Results: The method exhibits strong cross-model transferability. Evaluated on mainstream models including Code Llama 7B, it achieves up to a 102% improvement in AUC over state-of-the-art methods. Extensive experiments on APPS and HumanEval benchmarks confirm its effectiveness and generalizability across diverse code generation tasks.

Technology Category

Application Category

📝 Abstract
Membership inference attacks (MIAs) on code completion models offer an effective way to assess privacy risks by inferring whether a given code snippet was part of the training data. Existing black- and gray-box MIAs rely on expensive surrogate models or manually crafted heuristic rules, which limit their ability to capture the nuanced memorization patterns exhibited by over-parameterized code language models. To address these challenges, we propose AdvPrompt-MIA, a method specifically designed for code completion models, combining code-specific adversarial perturbations with deep learning. The core novelty of our method lies in designing a series of adversarial prompts that induce variations in the victim code model's output. By comparing these outputs with the ground-truth completion, we construct feature vectors to train a classifier that automatically distinguishes member from non-member samples. This design allows our method to capture richer memorization patterns and accurately infer training set membership. We conduct comprehensive evaluations on widely adopted models, such as Code Llama 7B, over the APPS and HumanEval benchmarks. The results show that our approach consistently outperforms state-of-the-art baselines, with AUC gains of up to 102%. In addition, our method exhibits strong transferability across different models and datasets, underscoring its practical utility and generalizability.
Problem

Research questions and friction points this paper is trying to address.

Membership inference attacks assess privacy risks in code completion models
Existing methods fail to capture nuanced memorization patterns in models
Proposing adversarial prompts to automatically distinguish training data membership
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial prompts induce model output variations
Feature vectors train membership classification model
Method outperforms baselines with significant AUC gains
🔎 Similar Papers
No similar papers found.