Combining Query Performance Predictors: A Reproducibility Study

📅 2025-03-31

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work investigates whether query performance prediction (QPP) method fusion enhances prediction quality and examines the reproducibility of prior findings across new models, evaluation metrics, and datasets. Method: We systematically integrate supervised neural QPP models—previously excluded from fusion frameworks—into a unified fusion pipeline, evaluating them on modern benchmarks including ClueWeb09B and TREC Deep Learning, using both pre- and post-retrieval approaches. Performance is assessed via sMARE alongside conventional metrics (e.g., Pearson, Spearman). We further propose a fine-grained complementarity criterion grounded in inter-method correlation to quantify information overlap. Contribution/Results: Most classical fusion conclusions remain robust; sMARE demonstrates superior sensitivity in distinguishing effective versus ineffective fusions; and highly correlated method combinations often degrade performance due to redundancy. This study establishes a new paradigm for QPP fusion, introduces a theoretically grounded complementarity criterion, and provides an updated empirical benchmark for future research.

Technology Category

Application Category

📝 Abstract

A large number of approaches to Query Performance Prediction (QPP) have been proposed over the last two decades. As early as 2009, Hauff et al. [28] explored whether different QPP methods may be combined to improve prediction quality. Since then, significant research has been done both on QPP approaches, as well as their evaluation. This study revisits Hauff et al.s work to assess the reproducibility of their findings in the light of new prediction methods, evaluation metrics, and datasets. We expand the scope of the earlier investigation by: (i) considering post-retrieval methods, including supervised neural techniques (only pre-retrieval techniques were studied in [28]); (ii) using sMARE for evaluation, in addition to the traditional correlation coefficients and RMSE; and (iii) experimenting with additional datasets (Clueweb09B and TREC DL). Our results largely support previous claims, but we also present several interesting findings. We interpret these findings by taking a more nuanced look at the correlation between QPP methods, examining whether they capture diverse information or rely on overlapping factors.

Problem

Research questions and friction points this paper is trying to address.

Investigating reproducibility of combining Query Performance Predictors (QPP) methods

Expanding QPP evaluation with new metrics, datasets, and neural techniques

Analyzing correlation diversity among QPP methods for improved prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining pre and post-retrieval QPP methods

Using sMARE for evaluation metrics

Expanding datasets to Clueweb09B and TREC DL

🔎 Similar Papers

No similar papers found.