🤖 AI Summary
Existing autonomous agents suffer from limited robustness, poor adaptability, and fragmented design paradigms. Method: This paper proposes a unified agent architecture featuring: (1) a collective multi-agent collaboration framework with a critic-model voting mechanism to enhance decision reliability; (2) a three-tier hierarchical memory system—comprising working, semantic, and procedural memory—to enable dynamic memory management and cross-task knowledge reuse; and (3) integrated planning-execution dual agents, tool augmentation (web search, code execution, multimodal parsing), and modular system fusion. Contribution/Results: Evaluated on comprehensive benchmarks, the architecture consistently outperforms leading open-source baselines and approaches the performance of commercial closed-source systems, demonstrating strong generalization across domains, scalability, and robust adaptive capability.
📝 Abstract
Large Language Models are increasingly deployed as autonomous agents for complex real-world tasks, yet existing systems often focus on isolated improvements without a unifying design for robustness and adaptability. We propose a generalist agent architecture that integrates three core components: a collective multi-agent framework combining planning and execution agents with critic model voting, a hierarchical memory system spanning working, semantic, and procedural layers, and a refined tool suite for search, code execution, and multimodal parsing. Evaluated on a comprehensive benchmark, our framework consistently outperforms open-source baselines and approaches the performance of proprietary systems. These results demonstrate the importance of system-level integration and highlight a path toward scalable, resilient, and adaptive AI assistants capable of operating across diverse domains and tasks.