π€ AI Summary
This work addresses the limitations of existing agent systems that rely on static, handcrafted toolsets and struggle to adapt to new domains or evolving scientific computing libraries. The authors propose a multi-agent framework that enables task-driven autonomous tool construction, validation, and cross-domain reuse through a four-stage pipeline: tool analysis, generation, execution, and iterative evaluation. In this paradigm, agent capabilities are dynamically defined by tasks, allowing weaker agents to enhance their performance by leveraging tools generated by more capable agents. Experimental results across 24 quantum chemistry and dynamics tasks demonstrate that the proposed tool generation and reuse mechanism significantly outperforms baseline approaches, substantially improving solution quality for weaker agents while effectively reducing API invocation costs.
π Abstract
AI for science promises to accelerate the discovery process. The advent of large language models (LLMs) and agentic workflows enables the expediting of a growing range of scientific tasks. However, most of the current generation of agentic systems depend on static, hand-curated toolsets that hinder adaptation to new domains and evolving libraries. We present El Agente Forjador, a multi-agent framework in which universal coding agents autonomously forge, validate, and reuse computational tools through a four-stage workflow of tool analysis, tool generation, task execution, and iterative solution evaluation. Evaluated across 24 tasks spanning quantum chemistry and quantum dynamics on five coding agent setups, we compare three operating modes: zero-shot generation of tools per task, reuse of a curriculum-built toolset, and direct problem-solving with the coding agents as the baseline. We find that our tool generation and reuse framework consistently improves accuracy over the baseline. We also show that reusing a toolset built by a stronger coding agent can reduce API cost and substantially raises the solution quality for weaker coding agents. Case studies further demonstrate that tools forged for different domains can be combined to solve hybrid tasks. Taken together, these results show that LLM-based agents can use their scientific knowledge and coding capabilities to autonomously build reusable scientific tools, pointing toward a paradigm in which agent capabilities are defined by the tasks they are designed to solve rather than by explicitly engineered implementations.