How Contaminated Is Your Benchmark? Quantifying Dataset Leakage in Large Language Models with Kernel Divergence

📅 2025-02-02

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Large language models (LLMs) often exhibit inflated evaluation scores due to data contamination—i.e., overlap between pretraining corpora and evaluation benchmarks—undermining assessment validity. Method: We propose the Kernel Divergence Score (KDS), the first metric leveraging the asymmetric impact of fine-tuning on embedding similarities between “seen” (contaminated) and “unseen” (clean) samples. KDS quantifies contamination by computing the Frobenius norm of the difference between kernel similarity matrices of sample embeddings before and after fine-tuning. Contribution/Results: KDS is theoretically grounded, strongly correlates with ground-truth contamination rates (r ≈ 0.99), and demonstrates cross-dataset robustness, substantially outperforming existing baselines. Ablation studies confirm the criticality of kernel-based granularity modeling and contrastive fine-tuning analysis. KDS establishes a novel, interpretable, and reproducible paradigm for detecting data contamination in LLM evaluation.

Technology Category

Application Category

📝 Abstract

Dataset contamination, where evaluation datasets overlap with pre-training corpora, inflates performance metrics and undermines the reliability of model evaluations. Quantifying dataset contamination thus becomes essential to ensure that performance evaluations genuinely reflect a model's ability to generalize to unseen data, rather than relying on memorized examples. To address this problem, we propose Kernel Divergence Score (KDS), a novel method that quantifies dataset contamination by computing the divergence between the kernel similarity matrix of sample embeddings, before and after fine-tuning on the benchmark dataset. Leveraging the insight that fine-tuning affects unseen samples more significantly than seen ones, KDS provides a reliable measure of contamination. Through extensive experiments on controlled contamination scenarios, KDS demonstrates a near-perfect correlation with contamination levels and outperforms existing baselines. Additionally, we perform comprehensive ablation studies to analyze the impact of key design choices, providing deeper insights into the components and effectiveness of KDS. These ablations highlight the importance of leveraging fine-grained kernel-based information and confirm the reliability of the proposed framework across diverse datasets and settings.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Unseen Data Performance

Dataset Contamination Measurement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Kernel Difference Score (KDS)

Data Leakage Assessment

Language Model Pollution

🔎 Similar Papers

A Comprehensive Survey of Contamination Detection Methods in Large Language Models