π€ AI Summary
This work addresses the overestimation of large language modelsβ reasoning capabilities due to data contamination, particularly subtle paraphrased contamination. To tackle this issue, the authors propose Zero-CoT Probe (ZCP), a black-box detection method that requires no white-box access. ZCP truncates chain-of-thought reasoning, constructs an isomorphic perturbed dataset, and compares zero-shot performance on original versus perturbed inputs to reveal whether models rely on memorization rather than genuine reasoning. Notably, ZCP is the first method capable of effectively identifying concealed contamination and introduces a Contamination Confidence metric to quantify both the likelihood and severity of contamination. Experiments demonstrate that ZCP significantly outperforms existing approaches on both known contaminated models and those specifically fine-tuned to evade detection, robustly identifying both direct and paraphrased data contamination.
π Abstract
Large language models (LLMs) have demonstrated impressive reasoning abilities across a wide range of tasks, but data contamination undermines the objective evaluation of these capabilities. This problem is further exacerbated by malicious model publishers who use evasive, or indirect, contamination strategies, such as paraphrasing benchmark data to evade existing detection methods and artificially boost leaderboard performance. Current approaches struggle to reliably detect such stealthy contamination. In this work, we uncover a critical phenomenon: a model's generated reasoning steps actively mask its underlying memorization. Inspired by this, we propose the Zero-CoT Probe (ZCP), a novel black-box detection method that deliberately truncates the entire Chain-of-Thought (CoT) process to expose latent shortcut mappings. To further isolate memorization from the model's intrinsic problem-solving capabilities, ZCP compares the model's zero-CoT performance on the original benchmark against an isomorphically perturbed reference dataset. Furthermore, we introduce Contamination Confidence, a metric that quantifies both the likelihood and severity of contamination, moving beyond simple binary classifications. Extensive experiments on both previously identified contaminated models and specially fine-tuned contaminated models demonstrate that ZCP robustly detects both direct and evasive data contamination. The code for ZCP is accessible at https://github.com/Yifan-Lan/zero-cot-probe.