๐ค AI Summary
Large language models (LLMs) face significant performance, efficiency, and engineering bottlenecks in document re-ranking. Method: This paper introduces the first modular, lightweight, plug-and-play LLM re-ranker framework, unifying support for both open-source (Hugging Face) and API-based (e.g., OpenAI) models. It integrates prompt engineering, pairwise/pointwise scoring, LoRA-based efficient fine-tuning, and multi-granularity evaluation (e.g., NDCG, MAP), enabling zero-code model and strategy switching. Contribution/Results: We propose a standardized evaluation protocol and reproducible experimental benchmark, validating over 10 LLMs on MSMARCO and TREC-DLโconsistently outperforming traditional baselines. The open-source implementation has been widely adopted, filling a critical gap in practical LLM-based re-ranking tooling.
๐ Abstract
Utilizing large language models (LLMs) for document reranking has been a popular and promising research direction in recent years, many studies are dedicated to improving the performance and efficiency of using LLMs for reranking. Besides, it can also be applied in many real-world applications, such as search engines or retrieval-augmented generation. In response to the growing demand for research and application in practice, we introduce a unified framework, extbf{LLM4Ranking}, which enables users to adopt different ranking methods using open-source or closed-source API-based LLMs. Our framework provides a simple and extensible interface for document reranking with LLMs, as well as easy-to-use evaluation and fine-tuning scripts for this task. We conducted experiments based on this framework and evaluated various models and methods on several widely used datasets, providing reproducibility results on utilizing LLMs for document reranking. Our code is publicly available at https://github.com/liuqi6777/llm4ranking.