Hypothetical Documents or Knowledge Leakage? Rethinking LLM-based Query Expansion

📅 2025-04-19

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work identifies knowledge leakage—not genuine hypothetical document generation—as the primary driver behind performance gains of LLM-based zero-shot query expansion on mainstream benchmarks. Using fact verification as a testbed, we systematically evaluate entailment relationships between generated documents and ground-truth evidence, finding that performance improvements are stable only when generated documents logically entail the true evidence. We provide the first empirical evidence that current benchmarks (e.g., FEVER, HotpotQA) suffer from implicit training data leakage, leading to inflated estimates of model generalization. To address this, we propose a leakage-free evaluation paradigm tailored to niche and emerging knowledge retrieval. Our findings challenge the prevailing assumption that LLMs reliably generate valid hypothetical documents, and offer both a diagnostic framework for trustworthy zero-shot retrieval and concrete pathways for benchmark reform. (149 words)

Technology Category

Application Category

📝 Abstract

Query expansion methods powered by large language models (LLMs) have demonstrated effectiveness in zero-shot retrieval tasks. These methods assume that LLMs can generate hypothetical documents that, when incorporated into a query vector, enhance the retrieval of real evidence. However, we challenge this assumption by investigating whether knowledge leakage in benchmarks contributes to the observed performance gains. Using fact verification as a testbed, we analyzed whether the generated documents contained information entailed by ground truth evidence and assessed their impact on performance. Our findings indicate that performance improvements occurred consistently only for claims whose generated documents included sentences entailed by ground truth evidence. This suggests that knowledge leakage may be present in these benchmarks, inflating the perceived performance of LLM-based query expansion methods, particularly in real-world scenarios that require retrieving niche or novel knowledge.

Problem

Research questions and friction points this paper is trying to address.

Investigates if LLM-based query expansion benefits from knowledge leakage in benchmarks

Assesses whether generated documents contain ground truth evidence information

Examines performance inflation in niche or novel knowledge retrieval scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based query expansion for retrieval

Investigating knowledge leakage in benchmarks

Assessing impact on real-world scenarios

🔎 Similar Papers

No similar papers found.