GlobalRAG: Enhancing Global Reasoning in Multi-hop Question Answering via Reinforcement Learning

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Reinforcement learning (RL) in retrieval-augmented generation (RAG) for multi-hop question answering suffers from two key bottlenecks: lack of global reasoning planning and unfaithful execution. Method: We propose a collaborative RL framework that jointly optimizes retrieval and reasoning. It structures multi-step inference via subgoal decomposition, integrates iterative evidence refinement with coherent planning, and introduces a dual-granularity reward mechanism—comprising a planning-quality reward and a subgoal-completion reward—alongside progressive weight annealing to balance process consistency and final answer accuracy. Contribution/Results: Our method significantly outperforms strong baselines on both in-domain and cross-domain benchmarks. Remarkably, it achieves average improvements of 14.2% in Exact Match (EM) and F1 scores using only 42% of the training data, empirically validating the effectiveness of co-optimizing global reasoning modeling and faithful execution.

Technology Category

Application Category

📝 Abstract

Reinforcement learning has recently shown promise in improving retrieval-augmented generation (RAG). Despite these advances, its effectiveness in multi-hop question answering (QA) remains limited by two fundamental limitations: (i) global planning absence to structure multi-step reasoning, and (ii) unfaithful execution, which hinders effective query formulation and consistent use of retrieved evidence. We propose GlobalRAG, a reinforcement learning framework designed to enhance global reasoning in multi-hop QA. GlobalRAG decomposes questions into subgoals, coordinates retrieval with reasoning, and refines evidence iteratively. To guide this process, we introduce Planning Quality Reward and SubGoal Completion Reward, which encourage coherent planning and reliable subgoal execution. In addition, a progressive weight annealing strategy balances process-oriented and outcome-based objectives. Extensive experiments on both in-domain and out-of-domain benchmarks demonstrate that GlobalRAG significantly outperforms strong baselines while using only 8k training data (42% of the training data used by strong baselines), achieving average improvements of 14.2% in both EM and F1.

Problem

Research questions and friction points this paper is trying to address.

Enhancing global reasoning in multi-hop question answering

Addressing absence of global planning in multi-step reasoning

Solving unfaithful execution in query formulation and evidence use

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes questions into subgoals for reasoning

Uses reward functions to guide planning and execution

Applies progressive weight annealing to balance objectives

🔎 Similar Papers

No similar papers found.