π€ AI Summary
Large language models (LLMs) suffer from static knowledge obsolescence and hallucination, undermining reliability in materials computation. Method: We propose an autonomous agent framework tailored for first-principles calculations, featuring a novel multi-step reasoning mechanism embedded with domain-specific physical constraints. It integrates a domain-knowledge-enhanced LLM, density functional theory (DFT) tool invocation, retrieval-augmented generation (RAG), and a physics-consistency verification module. Contribution/Results: We introduce the first benchmark specifically designed for autonomous materials computation, enabling verifiable, end-to-end computational experiments. Our framework significantly improves task accuracy and robustness while delivering fully automated, reproducible, and physics-grounded computational workflows. It effectively bridges the reliability gap between general-purpose LLMs and rigorous scientific computing in materials science.
π Abstract
Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery, yet their static knowledge and hallucination issues hinder autonomous research applications. Recent advances integrate LLMs into agentic frameworks, enabling retrieval, reasoning, and tool use for complex scientific workflows. Here, we present a domain-specialized agent designed for reliable automation of first-principles materials computations. By embedding domain expertise, the agent ensures physically coherent multi-step workflows and consistently selects convergent, well-posed parameters, thereby enabling reliable end-to-end computational execution. A new benchmark of diverse computational tasks demonstrates that our system significantly outperforms standalone LLMs in both accuracy and robustness. This work establishes a verifiable foundation for autonomous computational experimentation and represents a key step toward fully automated scientific discovery.