SERSEM: Selective Entropy-Weighted Scoring for Membership Inference in Code Language Models

📅 2026-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses data contamination in large language models for code arising from the use of non-permissive training data by proposing a white-box membership inference attack framework to detect whether models memorize training samples. The method innovatively integrates abstract syntax tree (AST) analysis, multilingual logical pattern recognition, and offline linting to generate character-level weight masks. These masks are used to weight Transformer activations and calibrate token-level Z-scores, thereby emphasizing anomalous patterns characteristic of human-authored code while suppressing uninformative syntactic boilerplate. Evaluated on StarCoder2-3B and StarCoder2-7B, the approach achieves AUC-ROC scores of 0.7913 and 0.7867, respectively, significantly outperforming baseline methods such as Loss, Min-K% Prob, and PAC.
📝 Abstract
As Large Language Models (LLMs) for code increasingly utilize massive, often non-permissively licensed datasets, evaluating data contamination through Membership Inference Attacks (MIAs) has become critical. We propose SERSEM (Selective Entropy-Weighted Scoring for Membership Inference), a novel white-box attack framework that suppresses uninformative syntactical boilerplate to amplify specific memorization signals. SERSEM utilizes a dual-signal methodology: first, a continuous character-level weight mask is derived through static Abstract Syntax Tree (AST) analysis, spellchecking-based multilingual logic detection, and offline linting. Second, these heuristic weights are used to pool internal transformer activations and calibrate token-level Z-scores from the output logits. Evaluated on a 25,000-sample balanced dataset, SERSEM achieves a global AUC-ROC of 0.7913 on the StarCoder2-3B model and 0.7867 on the StarCoder2-7B model, consistently outperforming the implemented probability-based baselines Loss, Min-K% Prob, and PAC. Our findings demonstrate that focusing on human-centric coding anomalies provides a significantly more robust indicator of verbatim memorization than sequence-level probability averages.
Problem

Research questions and friction points this paper is trying to address.

Membership Inference Attacks
Code Language Models
Data Contamination
Memorization Detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Membership Inference Attack
Code Language Models
Entropy-Weighted Scoring
Abstract Syntax Tree
Verbatim Memorization
🔎 Similar Papers
No similar papers found.
K
Kıvanç Kuzey Dikici
Bilkent University, Computer Engineering, Ankara, Turkey
S
Serdar Kara
Bilkent University, Computer Engineering, Ankara, Turkey
S
Semih Çağlar
Bilkent University, Computer Engineering, Ankara, Turkey
Eray Tüzün
Eray Tüzün
Bilkent University
Software AnalyticsEmpirical Software EngineeringSoftware ProductivitySoftware Product Line EngineeringBioinformatics
S
Sinem Sav
Bilkent University, Computer Engineering, Ankara, Turkey