🤖 AI Summary
This work addresses data contamination in large language models for code arising from the use of non-permissive training data by proposing a white-box membership inference attack framework to detect whether models memorize training samples. The method innovatively integrates abstract syntax tree (AST) analysis, multilingual logical pattern recognition, and offline linting to generate character-level weight masks. These masks are used to weight Transformer activations and calibrate token-level Z-scores, thereby emphasizing anomalous patterns characteristic of human-authored code while suppressing uninformative syntactic boilerplate. Evaluated on StarCoder2-3B and StarCoder2-7B, the approach achieves AUC-ROC scores of 0.7913 and 0.7867, respectively, significantly outperforming baseline methods such as Loss, Min-K% Prob, and PAC.
📝 Abstract
As Large Language Models (LLMs) for code increasingly utilize massive, often non-permissively licensed datasets, evaluating data contamination through Membership Inference Attacks (MIAs) has become critical. We propose SERSEM (Selective Entropy-Weighted Scoring for Membership Inference), a novel white-box attack framework that suppresses uninformative syntactical boilerplate to amplify specific memorization signals. SERSEM utilizes a dual-signal methodology: first, a continuous character-level weight mask is derived through static Abstract Syntax Tree (AST) analysis, spellchecking-based multilingual logic detection, and offline linting. Second, these heuristic weights are used to pool internal transformer activations and calibrate token-level Z-scores from the output logits. Evaluated on a 25,000-sample balanced dataset, SERSEM achieves a global AUC-ROC of 0.7913 on the StarCoder2-3B model and 0.7867 on the StarCoder2-7B model, consistently outperforming the implemented probability-based baselines Loss, Min-K% Prob, and PAC. Our findings demonstrate that focusing on human-centric coding anomalies provides a significantly more robust indicator of verbatim memorization than sequence-level probability averages.