🤖 AI Summary
This work addresses the challenges of contextual noise accumulation, error propagation, and limited module scalability that autonomous agents face in long-horizon deep research tasks. To overcome these issues, the authors propose a hierarchical, modular, and highly robust deep research framework that enables efficient collaboration and scalable reasoning through centralized multi-agent orchestration, dynamic semantic context summarization, an atomic capability pool, and active anomaly detection with pruning mechanisms. The framework achieves state-of-the-art performance across multiple benchmarks, including GAIA, BrowseComp, BrowseComp-ZH, and Humanity's Last Exam, and the authors publicly release both the code and the framework to support further research.
📝 Abstract
Deep research has emerged as a transformative capability for autonomous agents, empowering Large Language Models to navigate complex, open-ended tasks. However, realizing its full potential is hindered by critical limitations, including escalating contextual noise in long-horizon tasks, fragility leading to cascading errors, and a lack of modular extensibility. To address these challenges, we introduce Yunque DeepResearch, a hierarchical, modular, and robust framework. The architecture is characterized by three key components: (1) a centralized Multi-Agent Orchestration System that routes subtasks to an Atomic Capability Pool of tools and specialized sub-agents; (2) a Dynamic Context Management mechanism that structures completed sub-goals into semantic summaries to mitigate information overload; and (3) a proactive Supervisor Module that ensures resilience through active anomaly detection and context pruning. Yunque DeepResearch achieves state-of-the-art performance across a range of agentic deep research benchmarks, including GAIA, BrowseComp, BrowseComp-ZH, and Humanity's Last Exam. We open-source the framework, reproducible implementations, and application cases to empower the community.