🤖 AI Summary
Existing LLM-based agent frameworks rely on sequential execution for multi-tool collaborative tasks, resulting in suboptimal efficiency. This paper proposes the first Directed Acyclic Graph (DAG)-based parallel reasoning framework, which explicitly models task dependencies as a DAG and introduces three core components: (1) dynamic graph optimization for runtime dependency refinement, (2) a concurrent execution engine enabling parallel subtask processing, and (3) a lightweight summarization module for intermediate state compression. Crucially, knowledge distillation is integrated to enhance generalization across diverse tool compositions. Evaluated on BrowseComp and xbench-DeepSearch benchmarks, our method achieves 67.7% and 83% accuracy, respectively—outperforming serial baselines—while reducing average execution steps by 35%. The framework is model-agnostic and seamlessly integrates with various LLM architectures without architectural modification.
📝 Abstract
Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks when equipped with external tools. However, current frameworks predominantly rely on sequential processing, leading to inefficient execution particularly for tasks requiring extensive tool interaction. This paper introduces Flash-Searcher, a novel parallel agent reasoning framework that fundamentally reimagines the execution paradigm from sequential chains to directed acyclic graphs (DAGs). Flash-Searcher decomposes complex tasks into subtasks with explicit dependencies, enabling concurrent execution of independent reasoning paths while maintaining logical constraints. Through dynamic workflow optimization, our framework continuously refines the execution graph based on intermediate results, effectively integrating summary module. Comprehensive evaluations across multiple benchmarks demonstrate that Flash-Searcher consistently outperforms existing approaches. Specifically, it achieves 67.7% accuracy on BrowseComp and 83% on xbench-DeepSearch, while reducing agent execution steps by up to 35% compared to current frameworks. Furthermore, when distilling this parallel reasoning pipeline into single models, we observe substantial performance gains across diverse backbone architectures, underscoring the generalizability of our methodology. Our work thus represents a significant advance in agent architecture design, offering a more scalable and efficient paradigm for complex reasoning tasks.