π€ AI Summary
Traditional query performance prediction (QPP) methods assess query difficulty solely within the context of a single ranker, making them ill-suited for fine-grained selection of the optimal ranker for a given query. This work generalizes the QPP task into three evaluation settings: single-ranker multi-query, multi-ranker single-query, and multi-ranker multi-query, and introduces the first unified QPP evaluation framework tailored for multi-ranker scenarios. Through systematic evaluation of standard QPP methods across diverse rankerβquery combinations, the study reveals significant performance disparities among QPP models under different settings and demonstrates that predicting the best-performing ranker is notably more challenging than predicting query difficulty. This research establishes a new paradigm for developing QPP systems with stronger generalization and practical utility.
π Abstract
The traditional use-case of query performance prediction (QPP) is to identify which queries perform well and which perform poorly for a given ranking model. A more fine-grained and arguably more challenging extension of this task is to determine which ranking models are most effective for a given query. In this work, we generalize the QPP task and its evaluation into three settings: (i) SingleRanker MultiQuery (SRMQ-PP), corresponding to the standard use case; (ii) MultiRanker SingleQuery (MRSQ-PP), which evaluates a QPP model's ability to select the most effective ranker for a query; and (iii) MultiRanker MultiQuery (MRMQ-PP), which considers predictions jointly across all query ranker pairs. Our results show that (a) the relative effectiveness of QPP models varies substantially across tasks (SRMQ-PP vs. MRSQ-PP), and (b) predicting the best ranker for a query is considerably more difficult than predicting the relative difficulty of queries for a given ranker.