HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Agentic RAG systems suffer from inefficient search behavior—namely, over-retrieval (redundant retriever calls) and under-retrieval (failure to retrieve relevant knowledge)—leading to high computational overhead and unreliable outputs. Existing reinforcement learning approaches, which rely solely on outcome-based rewards, lack fine-grained control over the reasoning process. To address this, we propose a hierarchical process reward mechanism that dynamically evaluates the necessity of retrieval at each step of the reasoning trajectory. Our framework integrates three complementary reward signals: grounding-aware process reward, result reward, and format reward, enabling precise control over search versus no-search decisions. Evaluated on Qwen2.5 and Llama-3.2 (3B/7B) across seven QA benchmarks, our method achieves average accuracies of 65.4% and 67.2%, respectively, reduces over-retrieval to just 2.3%, and significantly mitigates under-retrieval—demonstrating substantial improvements in efficiency, accuracy, and cross-benchmark generalization.

Technology Category

Application Category

📝 Abstract

Agentic RAG is a powerful technique for incorporating external information that LLMs lack, enabling better problem solving and question answering. However, suboptimal search behaviors exist widely, such as over-search (retrieving information already known) and under-search (failing to search when necessary), which leads to unnecessary overhead and unreliable outputs. Current training methods, which typically rely on outcome-based rewards in a RL framework, lack the fine-grained control needed to address these inefficiencies. To overcome this, we introduce Hierarchical Process Rewards for Efficient agentic RAG (HiPRAG), a training methodology that incorporates a fine-grained, knowledge-grounded process reward into the RL training. Our approach evaluates the necessity of each search decision on-the-fly by decomposing the agent's reasoning trajectory into discrete, parsable steps. We then apply a hierarchical reward function that provides an additional bonus based on the proportion of optimal search and non-search steps, on top of commonly used outcome and format rewards. Experiments on the Qwen2.5 and Llama-3.2 models across seven diverse QA benchmarks show that our method achieves average accuracies of 65.4% (3B) and 67.2% (7B). This is accomplished while improving search efficiency, reducing the over-search rate to just 2.3% and concurrently lowering the under-search rate. These results demonstrate the efficacy of optimizing the reasoning process itself, not just the final outcome. Further experiments and analysis demonstrate that HiPRAG shows good generalizability across a wide range of RL algorithms, model families, sizes, and types. This work demonstrates the importance and potential of fine-grained control through RL, for improving the efficiency and optimality of reasoning for search agents.

Problem

Research questions and friction points this paper is trying to address.

Addresses suboptimal search behaviors in Agentic RAG systems

Overcomes limitations of outcome-based rewards in RL training

Improves search efficiency by reducing over-search and under-search

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical process rewards optimize search decisions dynamically

Decomposes reasoning trajectory into discrete parsable steps

Improves search efficiency by reducing over-search and under-search

🔎 Similar Papers

No similar papers found.

Qualcomm

$104,000.00 - $156,000.00

San Diego, California, United States of America

Principal Machine Learning Engineer, GAI Search Relevance - Ranking - Moveworks

ServiceNow

Mountain View, CALIFORNIA, US

Authors to Follow