🤖 AI Summary
Biomedical research suffers from knowledge acquisition bottlenecks due to severe fragmentation across data, tools, and literature; existing AI agents are constrained by static toolsets, limiting adaptability and scalability. To address this, we propose a multi-agent self-evolving architecture featuring (i) an evolvable reasoning template library and (ii) a “tool ocean” mechanism enabling automated tool discovery, validation, and dynamic integration. Coupled with large language model–driven dynamic knowledge organization and experience-informed strategy optimization, the framework achieves continuous, autonomous agent capability growth. Evaluated on benchmarks including *Humanity’s Last Exam: Biomedicine*, the system achieves an initial score of 26%, nearly doubling after iterative experience accumulation—significantly outperforming state-of-the-art methods. Our approach establishes a scalable, adaptive paradigm for open-domain scientific AI agents.
📝 Abstract
The rapid growth of biomedical data, tools, and literature has created a fragmented research landscape that outpaces human expertise. While AI agents offer a solution, they typically rely on static, manually curated toolsets, limiting their ability to adapt and scale. Here, we introduce STELLA, a self-evolving AI agent designed to overcome these limitations. STELLA employs a multi-agent architecture that autonomously improves its own capabilities through two core mechanisms: an evolving Template Library for reasoning strategies and a dynamic Tool Ocean that expands as a Tool Creation Agent automatically discovers and integrates new bioinformatics tools. This allows STELLA to learn from experience. We demonstrate that STELLA achieves state-of-the-art accuracy on a suite of biomedical benchmarks, scoring approximately 26% on Humanity's Last Exam: Biomedicine, 54% on LAB-Bench: DBQA, and 63% on LAB-Bench: LitQA, outperforming leading models by up to 6 percentage points. More importantly, we show that its performance systematically improves with experience; for instance, its accuracy on the Humanity's Last Exam benchmark almost doubles with increased trials. STELLA represents a significant advance towards AI Agent systems that can learn and grow, dynamically scaling their expertise to accelerate the pace of biomedical discovery.