HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Agentic RAG systems suffer from inefficient search behavior—namely, over-retrieval (redundant retriever calls) and under-retrieval (failure to retrieve relevant knowledge)—leading to high computational overhead and unreliable outputs. Existing reinforcement learning approaches, which rely solely on outcome-based rewards, lack fine-grained control over the reasoning process. To address this, we propose a hierarchical process reward mechanism that dynamically evaluates the necessity of retrieval at each step of the reasoning trajectory. Our framework integrates three complementary reward signals: grounding-aware process reward, result reward, and format reward, enabling precise control over search versus no-search decisions. Evaluated on Qwen2.5 and Llama-3.2 (3B/7B) across seven QA benchmarks, our method achieves average accuracies of 65.4% and 67.2%, respectively, reduces over-retrieval to just 2.3%, and significantly mitigates under-retrieval—demonstrating substantial improvements in efficiency, accuracy, and cross-benchmark generalization.

Technology Category

Application Category

📝 Abstract
Agentic RAG is a powerful technique for incorporating external information that LLMs lack, enabling better problem solving and question answering. However, suboptimal search behaviors exist widely, such as over-search (retrieving information already known) and under-search (failing to search when necessary), which leads to unnecessary overhead and unreliable outputs. Current training methods, which typically rely on outcome-based rewards in a RL framework, lack the fine-grained control needed to address these inefficiencies. To overcome this, we introduce Hierarchical Process Rewards for Efficient agentic RAG (HiPRAG), a training methodology that incorporates a fine-grained, knowledge-grounded process reward into the RL training. Our approach evaluates the necessity of each search decision on-the-fly by decomposing the agent's reasoning trajectory into discrete, parsable steps. We then apply a hierarchical reward function that provides an additional bonus based on the proportion of optimal search and non-search steps, on top of commonly used outcome and format rewards. Experiments on the Qwen2.5 and Llama-3.2 models across seven diverse QA benchmarks show that our method achieves average accuracies of 65.4% (3B) and 67.2% (7B). This is accomplished while improving search efficiency, reducing the over-search rate to just 2.3% and concurrently lowering the under-search rate. These results demonstrate the efficacy of optimizing the reasoning process itself, not just the final outcome. Further experiments and analysis demonstrate that HiPRAG shows good generalizability across a wide range of RL algorithms, model families, sizes, and types. This work demonstrates the importance and potential of fine-grained control through RL, for improving the efficiency and optimality of reasoning for search agents.
Problem

Research questions and friction points this paper is trying to address.

Addresses suboptimal search behaviors in Agentic RAG systems
Overcomes limitations of outcome-based rewards in RL training
Improves search efficiency by reducing over-search and under-search
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical process rewards optimize search decisions dynamically
Decomposes reasoning trajectory into discrete parsable steps
Improves search efficiency by reducing over-search and under-search
🔎 Similar Papers
No similar papers found.