ProRank: Prompt Warmup via Reinforcement Learning for Small Language Models Reranking

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Small language models (SLMs) suffer from limited zero-shot performance in document re-ranking due to weak prompt comprehension. Method: This paper proposes a two-stage training framework: first, GRPO-based reinforcement learning (RL) for “prompt warm-starting”—the first application of RL to zero-shot prompt alignment for SLMs; second, fine-tuning a regression head to predict fine-grained relevance scores. No additional parameters are introduced. Results: On the BEIR benchmark, ProRank-0.5B outperforms a 32B commercial LLM in re-ranking quality while achieving 8× faster inference and 90% lower GPU memory consumption. The core contribution is the first zero-shot prompt alignment mechanism tailored for SLMs, empirically demonstrating that RL effectively bridges the gap between compact models and complex retrieval prompts—simultaneously enhancing accuracy, efficiency, and deployment feasibility.

Technology Category

Application Category

📝 Abstract
Reranking is fundamental to information retrieval and retrieval-augmented generation, with recent Large Language Models (LLMs) significantly advancing reranking quality. While recent advances with LLMs have significantly improved document reranking quality, current approaches primarily rely on large-scale LLMs (>7B parameters) through zero-shot prompting, presenting high computational costs. Small Language Models (SLMs) offer a promising alternative because of their efficiency, but our preliminary quantitative analysis reveals they struggle with understanding task prompts without fine-tuning. This limits their effectiveness for document reranking tasks. To address this issue, we introduce a novel two-stage training approach, ProRank, for SLM-based document reranking. First, we propose a prompt warmup stage using reinforcement learning GRPO to steer SLMs to understand task prompts and generate more accurate coarse-grained binary relevance scores for document reranking. Then, we continuously fine-tune the SLMs with a fine-grained score learning stage without introducing additional layers to further improve the reranking quality. Comprehensive experimental results demonstrate that the proposed ProRank consistently outperforms both the most advanced open-source and proprietary reranking models. Notably, our lightweight ProRank-0.5B model even surpasses the powerful 32B LLM reranking model on the BEIR benchmark, establishing that properly trained SLMs can achieve superior document reranking performance while maintaining computational efficiency.
Problem

Research questions and friction points this paper is trying to address.

SLMs struggle with understanding task prompts without fine-tuning
High computational costs of large-scale LLMs for document reranking
Need for efficient yet effective reranking models using SLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage training for SLM reranking
Reinforcement learning prompt warmup
Fine-grained score learning without extra layers
🔎 Similar Papers
No similar papers found.