Think Straight, Stop Smart: Structured Reasoning for Efficient Multi-Hop RAG

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multi-hop retrieval-augmented generation (RAG) methods suffer from token redundancy, unstable termination, and low inference efficiency in complex reasoning tasks. This paper proposes TSSS, a novel framework that jointly optimizes *structured template-based reasoning* and a *retriever-driven deterministic termination mechanism*. First, it employs templated prompts to explicitly anchor sub-queries and cache repetitive prefix tokens, thereby modeling multi-step reasoning structure. Second, it decouples termination decision-making into an independent module triggered directly by retriever confidence scores—eliminating stochastic truncation and redundant token generation. This design significantly enhances reasoning controllability and answer reliability. Experiments demonstrate that TSSS achieves state-of-the-art accuracy on HotpotQA, 2WikiMultiHop, and MuSiQue, while reducing generated token count by over 30%. Its lightweight architecture ensures strong deployment friendliness, particularly for resource-constrained environments such as edge devices.

Technology Category

Application Category

📝 Abstract
Multi-hop retrieval-augmented generation (RAG) is a promising strategy for complex reasoning, yet existing iterative prompting approaches remain inefficient. They often regenerate predictable token sequences at every step and rely on stochastic stopping, leading to excessive token usage and unstable termination. We propose TSSS (Think Straight, Stop Smart), a structured multi-hop RAG framework designed for efficiency. TSSS introduces (i) a template-based reasoning that caches recurring prefixes and anchors sub-queries to the main question, reducing token generation cost while promoting stable reasoning, and (ii) a retriever-based terminator, which deterministically halts reasoning once additional sub-queries collapse into repetition. This separation of structured reasoning and termination control enables both faster inference and more reliable answers. On HotpotQA, 2WikiMultiHop, and MuSiQue, TSSS achieves state-of-the-art accuracy and competitive efficiency among RAG-CoT approaches, highlighting its effectiveness in efficiency-constrained scenarios such as on-device inference.
Problem

Research questions and friction points this paper is trying to address.

Reduces token generation costs in multi-hop RAG systems
Eliminates stochastic stopping for stable reasoning termination
Improves efficiency for on-device inference scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Template-based reasoning caches prefixes and anchors sub-queries
Retriever-based terminator deterministically halts repetitive sub-queries
Structured reasoning and termination control enable faster inference