🤖 AI Summary
Existing LLM-based ranking models rely on large-scale language models and explicit chain-of-thought (CoT) reasoning, incurring high computational overhead and latency, hindering practical deployment. To address this, we propose TFRank—a lightweight pointwise ranking model based on small LLMs (e.g., 1.7B parameters)—featuring a novel “reasoning-mode switching” mechanism. During training, TFRank jointly learns from CoT-annotated data and fine-grained relevance scores via multi-task learning; during inference, it bypasses implicit CoT generation and directly outputs relevance scores, effectively decoupling inference from reasoning. This design eliminates redundant text generation, drastically reducing latency and resource consumption. Experiments show that TFRank matches the performance of models with four times its parameter count on BRIGHT, while remaining highly competitive on BEIR. TFRank thus provides a practical, high-accuracy, low-overhead solution for real-world retrieval systems.
📝 Abstract
Reasoning-intensive ranking models built on Large Language Models (LLMs) have made notable progress, but existing approaches often rely on large-scale LLMs and explicit Chain-of-Thought (CoT) reasoning, resulting in high computational cost and latency that limit real-world use. To address this, we propose extbf{TFRank}, an efficient pointwise reasoning ranker based on small-scale LLMs. To improve ranking performance, TFRank effectively integrates CoT data, fine-grained score supervision, and multi-task training. Furthermore, it achieves an efficient `` extbf{T}hink- extbf{F}ree" reasoning capability by employing a ``think-mode switch'' and pointwise format constraints. Specifically, this allows the model to leverage explicit reasoning during training while delivering precise relevance scores for complex queries at inference without generating any reasoning chains. Experiments show that TFRank (e.g., 1.7B) achieves performance comparable to models with four times more parameters on the BRIGHT benchmark, and demonstrates strong competitiveness on the BEIR benchmark. Further analysis shows that TFRank achieves an effective balance between performance and efficiency, providing a practical solution for integrating advanced reasoning into real-world systems. Our code and data are released in the repository: https://github.com/JOHNNY-fans/TFRank.