π€ AI Summary
This work addresses the susceptibility of generative listwise reranking models to position bias, which causes their outputs to deviate from true relevance due to sensitivity to input order. Existing debiasing methods struggle to balance effectiveness and efficiency. To overcome this limitation, the authors propose CapCal, a training-free framework that decouples position bias for the first time without requiring model retraining. CapCal estimates the bias distribution using content-agnostic placeholders and corrects output logits via an entropy-adaptive contrastive mechanism. The method achieves significant performance gains across ten benchmarks, outperforming existing training-free approaches by substantial marginsβe.g., yielding over 10 absolute points improvement in NDCG with lightweight models (e.g., 0.6B parameters)βwhile also surpassing strong baselines such as permutation ensembles and data augmentation, thereby maintaining both high inference efficiency and superior ranking quality.
π Abstract
Generative listwise reranking leverages global context for superior retrieval but is plagued by intrinsic position bias, where models exhibit structural sensitivity to input order independent of relevance. Existing mitigations present a dilemma: inference-time aggregation incurs prohibitive latency, while training-based methods often fail to eradicate ingrained priors, particularly in compact models. To resolve this dilemma, we propose CapCal (Content-Agnostic Probability Calibration), a training-free framework that mechanically decouples positional bias from ranking decisions. By estimating the bias distribution via content-free placeholders, CapCal rectifies output logits through an entropy-adaptive contrastive mechanism. Evaluations across 10 benchmarks confirm that CapCal achieves superior performance among training-free methods while preserving single-pass efficiency. Notably, it unlocks the latent potential of lightweight models (e.g., 0.6B), delivering absolute NDCG gains exceeding 10 points and outperforming both permutation-based aggregation and data-augmentation baselines.