TRACER: A Semantic-Aware Framework for Fine-Grained Contamination Detection in Code LLMs

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

128K/year

🤖 AI Summary

This work addresses the challenge of semantic-level data contamination in code large language model evaluation, where traditional methods fail to detect non-exact yet semantically similar samples. To tackle this issue, the authors propose TRACER, a novel framework that introduces the first fine-grained three-tier semantic taxonomy for code contamination—encompassing functionally identical, nearly identical, and logic-sharing code—and constructs the first benchmark dataset dedicated to this task. TRACER employs a semantics-aware multi-level matching strategy within a coarse-to-fine detection pipeline, leveraging large language model embeddings for semantic code comparison. Experimental results demonstrate that TRACER achieves an F1 score of 0.91 in fine-grained detection and 0.92 in binary classification, substantially outperforming existing methods by 42% to 217%.

📝 Abstract

Data contamination is a known threat to the reliability of model evaluation. However, it remains underexplored in code large language models (LLMs), where contamination often goes beyond exact duplication. We present TRACER, a semantic-aware framework for fine-grained code contamination detection. TRACER models contamination using three levels of semantic overlap - Functionally Identical, Nearly Identical, and Shared Logic - and detects them through a coarse-to-fine pipeline. We also introduce the first benchmark for fine-grained code contamination detection, spanning three widely used benchmarks and three representative post-training datasets. TRACER achieves strong and consistent performance across multiple LLM backbones, with GPT-5 reaching an F1 score of 0.91 in fine-grained detection. In the binary setting, TRACER attains an F1 of 0.92, outperforming existing methods by 42%-217%. We further conduct ablation studies and error analysis to assess the contributions of individual components in TRACER.

Problem

Research questions and friction points this paper is trying to address.

data contamination

code LLMs

semantic overlap

fine-grained detection

model evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic-aware contamination detection

fine-grained code analysis

code LLM evaluation