🤖 AI Summary
Visual language models (VLMs) pretrained on internet-scale proprietary data often exhibit inflated performance due to test-set leakage, yet existing leakage detection methods fail on such models. This paper first systematically exposes the fundamental limitations of prevailing VLM contamination detection approaches. We propose a novel, multimodal semantic perturbation–based detection framework that constructs adversarial test environments by jointly perturbing the semantic spaces of images and texts—e.g., via attribute substitution or relational inversion—to expose model reliance on leaked data. Our method is robust across diverse contamination strategies, highly interpretable, and requires no access to training data or model gradients. Extensive experiments across realistic contamination scenarios demonstrate consistent superiority over baselines, achieving an average 23.6% improvement in detection accuracy. The code and perturbed benchmark dataset will be publicly released.
📝 Abstract
Recent advances in Vision-Language Models (VLMs) have achieved state-of-the-art performance on numerous benchmark tasks. However, the use of internet-scale, often proprietary, pretraining corpora raises a critical concern for both practitioners and users: inflated performance due to test-set leakage. While prior works have proposed mitigation strategies such as decontamination of pretraining data and benchmark redesign for LLMs, the complementary direction of developing detection methods for contaminated VLMs remains underexplored. To address this gap, we deliberately contaminate open-source VLMs on popular benchmarks and show that existing detection approaches either fail outright or exhibit inconsistent behavior. We then propose a novel simple yet effective detection method based on multi-modal semantic perturbation, demonstrating that contaminated models fail to generalize under controlled perturbations. Finally, we validate our approach across multiple realistic contamination strategies, confirming its robustness and effectiveness. The code and perturbed dataset will be released publicly.