🤖 AI Summary
Current large language models lack structured, traceable, and auditable execution mechanisms for scientific workflow automation. This work proposes a single-agent framework that embeds large language model decision-making within a type-safe execution environment and introduces a context management mechanism based on typed symbolic identifiers. By integrating an object-graph mapper with a dynamic knowledge graph, the framework enables structured persistence of computational states and contextual information. The architecture supports robust multi-step parallel computations in tasks such as quantum chemistry, conformational ensemble generation, and metal–organic framework design, demonstrating its scalability, consistency, and efficiency in complex scientific automation scenarios.
📝 Abstract
Large language models (LLMs) are increasingly used to automate scientific workflows, yet their integration with heterogeneous computational tools remains ad hoc and fragile. Current agentic approaches often rely on unstructured text to manage context and coordinate execution, generating often overwhelming volumes of information that may obscure decision provenance and hinder auditability. In this work, we present El Agente Gr\'afico, a single-agent framework that embeds LLM-driven decision-making within a type-safe execution environment and dynamic knowledge graphs for external persistence. Central to our approach is a structured abstraction of scientific concepts and an object-graph mapper that represents computational state as typed Python objects, stored either in memory or persisted in an external knowledge graph. This design enables context management through typed symbolic identifiers rather than raw text, thereby ensuring consistency, supporting provenance tracking, and enabling efficient tool orchestration. We evaluate the system by developing an automated benchmarking framework across a suite of university-level quantum chemistry tasks previously evaluated on a multi-agent system, demonstrating that a single agent, when coupled to a reliable execution engine, can robustly perform complex, multi-step, and parallel computations. We further extend this paradigm to two other large classes of applications: conformer ensemble generation and metal-organic framework design, where knowledge graphs serve as both memory and reasoning substrates. Together, these results illustrate how abstraction and type safety can provide a scalable foundation for agentic scientific automation beyond prompt-centric designs.