🤖 AI Summary
Large language models (LLMs) face two key challenges in paragraph reranking: prohibitive computational cost and sensitivity to external biases—such as positional or selection bias. To address these, we propose MVP, a multi-view guided non-generative reranker. MVP employs query-aware multi-view embedding encoding to disentangle paragraph representations and mitigate bias; introduces an orthogonal loss to enhance inter-view diversity; and adopts a single-step, non-autoregressive scoring mechanism for efficient inference. With only 220M parameters, MVP matches the performance of fine-tuned 7B LLMs while reducing inference latency by 100×; its 3B variant achieves state-of-the-art results across cross-domain benchmarks. Our core contributions are (i) bias-robust multi-view representation learning and (ii) low-latency, non-autoregressive relevance modeling—enabling accurate, efficient, and generalizable reranking without generative overhead.
📝 Abstract
Recent advances in large language models (LLMs) have shown impressive performance in passage reranking tasks. Despite their success, LLM-based methods still face challenges in efficiency and sensitivity to external biases. (1) Existing models rely mostly on autoregressive generation and sliding window strategies to rank passages, which incur heavy computational overhead as the number of passages increases. (2) External biases, such as position or selection bias, hinder the model's ability to accurately represent passages and increase input-order sensitivity. To address these limitations, we introduce a novel passage reranking model, called Multi-View-guided Passage Reranking (MVP). MVP is a non-generative LLM-based reranking method that encodes query-passage information into diverse view embeddings without being influenced by external biases. For each view, it combines query-aware passage embeddings to produce a distinct anchor vector, which is then used to directly compute relevance scores in a single decoding step. In addition, it employs an orthogonal loss to make the views more distinctive. Extensive experiments demonstrate that MVP, with just 220M parameters, matches the performance of much larger 7B-scale fine-tuned models while achieving a 100x reduction in inference latency. Notably, the 3B-parameter variant of MVP achieves state-of-the-art performance on both in-domain and out-of-domain benchmarks. The source code is available at: https://github.com/bulbna/MVP