🤖 AI Summary
Existing generative query variant (QV)-based query performance prediction (QPP) methods suffer from topic drift and hallucination. To address this, we propose a retrieval-based QV construction paradigm: first retrieving semantically similar historical queries from the training set as initial variants, then performing two-hop backward retrieval over their relevant documents to enhance recall while preserving semantic consistency. Unlike conventional approaches relying on embedding expansion or context-agnostic generation, our method introduces retrieval—rather than generation—as the core mechanism for QV construction, marking the first such application in QV generation. Evaluated on MS MARCO and TREC DL’19/20 with neural rankers (e.g., MonoT5), our approach improves QPP accuracy by approximately 20% over the best generative baseline, significantly mitigating topic drift and enhancing both prediction robustness and interpretability.
📝 Abstract
Leveraging query variants (QVs), i.e., queries with potentially similar information needs to the target query, has been shown to improve the effectiveness of query performance prediction (QPP) approaches. Existing QV-based QPP methods generate QVs facilitated by either query expansion or non-contextual embeddings, which may introduce topical drifts and hallucinations. In this paper, we propose a method that retrieves QVs from a training set (e.g., MS MARCO) for a given target query of QPP. To achieve a high recall in retrieving queries with the most similar information needs as the target query from a training set, we extend the directly retrieved QVs (1-hop QVs) by a second retrieval using their denoted relevant documents (which yields 2-hop QVs). Our experiments, conducted on TREC DL'19 and DL'20, show that the QPP methods with QVs retrieved by our method outperform the best-performing existing generated-QV-based QPP approaches by as much as around 20%, on neural ranking models like MonoT5.