🤖 AI Summary
This work proposes and systematically evaluates multiple multi-agent collaboration frameworks to overcome the cognitive limitations of single large language models in automated scientific research. By constructing an execution-testing platform based on Git worktree isolation and explicit global memory, the study compares single-agent, parallel sub-agent, and expert-team architectures under a fixed computational budget for automated machine learning optimization. The findings reveal a fundamental trade-off between runtime stability and theoretical depth in multi-agent systems and suggest a design principle of dynamically adapting collaboration structures according to task complexity: the sub-agent architecture demonstrates high robustness and breadth-first search advantages under time constraints, whereas the expert-team approach, despite lower fault tolerance, achieves the deep theoretical alignment necessary for complex model reconstruction when sufficient computational resources are available.
📝 Abstract
As AI agents evolve, the community is rapidly shifting from single Large Language Models (LLMs) to Multi-Agent Systems (MAS) to overcome cognitive bottlenecks in automated research. However, the optimal multi-agent coordination framework for these autonomous agents remains largely unexplored. In this paper, we present a systematic empirical study investigating the comparative efficacy of distinct multi-agent structures for automated machine learning optimization. Utilizing a rigorously controlled, execution-based testbed equipped with Git worktree isolation and explicit global memory, we benchmark a single-agent baseline against two multi-agent paradigms: a subagent architecture (parallel exploration with post-hoc consolidation) and an agent team architecture (experts with pre-execution handoffs). By evaluating these systems under strictly fixed computational time budgets, our findings reveal a fundamental trade-off between operational stability and theoretical deliberation. The subagent mode functions as a highly resilient, high-throughput search engine optimal for broad, shallow optimizations under strict time constraints. Conversely, the agent team topology exhibits higher operational fragility due to multi-author code generation but achieves the deep theoretical alignment necessary for complex architectural refactoring given extended compute budgets. These empirical insights provide actionable guidelines for designing future autoresearch systems, advocating for dynamically routed architectures that adapt their collaborative structures to real-time task complexity.