๐ค AI Summary
Real-world complex search tasks demand deep cross-source reasoning and knowledge fusion, yet existing RAG systems and monolithic reasoning models suffer from tightly coupled planning and execution, poor scalability, and low inference efficiency. To address these limitations, we propose HiRA, a hierarchical reasoning framework that decouples high-level task planning from low-level execution for the first time. HiRA introduces domain-specific agent collaboration mechanisms to support task decomposition, tool invocation, chain-of-thought reasoning, and structured result fusionโthereby isolating strategic decision-making from operational details and significantly enhancing reasoning depth and knowledge integration capability. Evaluated on four cross-modal deep search benchmarks, HiRA consistently outperforms state-of-the-art RAG and multi-agent approaches in both answer quality and system efficiency, empirically validating the effectiveness and generality of the hierarchical decoupling paradigm.
๐ Abstract
Complex information needs in real-world search scenarios demand deep reasoning and knowledge synthesis across diverse sources, which traditional retrieval-augmented generation (RAG) pipelines struggle to address effectively. Current reasoning-based approaches suffer from a fundamental limitation: they use a single model to handle both high-level planning and detailed execution, leading to inefficient reasoning and limited scalability. In this paper, we introduce HiRA, a hierarchical framework that separates strategic planning from specialized execution. Our approach decomposes complex search tasks into focused subtasks, assigns each subtask to domain-specific agents equipped with external tools and reasoning capabilities, and coordinates the results through a structured integration mechanism. This separation prevents execution details from disrupting high-level reasoning while enabling the system to leverage specialized expertise for different types of information processing. Experiments on four complex, cross-modal deep search benchmarks demonstrate that HiRA significantly outperforms state-of-the-art RAG and agent-based systems. Our results show improvements in both answer quality and system efficiency, highlighting the effectiveness of decoupled planning and execution for multi-step information seeking tasks. Our code is available at https://github.com/ignorejjj/HiRA.