Detecting Data Contamination in LLMs via In-Context Learning

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

This work addresses the challenge of detecting training data contamination in large language models (LLMs) without access to the original training corpus. We propose CoDeC, a contamination detection method that analyzes shifts in model confidence during in-context learning (ICL): specifically, it identifies memory traces by measuring anomalous confidence degradation when models process samples from their own training set—contrasted with unseen data. Crucially, we observe and formalize—for the first time—that training data interferes with ICL dynamics, inducing statistically detectable confidence suppression. This effect serves as an interpretable, model-agnostic contamination signal. Extensive experiments across multiple open-weight LLMs demonstrate that CoDeC’s contamination scores robustly separate seen (contaminated) from unseen (clean) data. The method is fully automated, requires no training data or model fine-tuning, and integrates seamlessly into standard evaluation pipelines. CoDeC establishes a new paradigm for privacy-aware data auditing and trustworthy model assessment.

Technology Category

Application Category

📝 Abstract

We present Contamination Detection via Context (CoDeC), a practical and accurate method to detect and quantify training data contamination in large language models. CoDeC distinguishes between data memorized during training and data outside the training distribution by measuring how in-context learning affects model performance. We find that in-context examples typically boost confidence for unseen datasets but may reduce it when the dataset was part of training, due to disrupted memorization patterns. Experiments show that CoDeC produces interpretable contamination scores that clearly separate seen and unseen datasets, and reveals strong evidence of memorization in open-weight models with undisclosed training corpora. The method is simple, automated, and both model- and dataset-agnostic, making it easy to integrate with benchmark evaluations.

Problem

Research questions and friction points this paper is trying to address.

Detecting training data contamination in large language models

Distinguishing memorized data from unseen datasets

Measuring in-context learning effects on model confidence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Detects contamination via in-context learning

Measures performance changes from memorization disruption

Provides automated model-agnostic contamination scores

🔎 Similar Papers

A Comprehensive Survey of Contamination Detection Methods in Large Language Models