Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents

📅 2025-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Scientific discovery faces persistent challenges—including data complexity, interdisciplinary collaboration barriers, and insufficient reproducibility—limiting the efficacy of conventional AI agents in research contexts. Method: This paper establishes a domain-specific LLM-based agent paradigm for scientific tasks, proposing a novel tripartite architecture: “domain-knowledge embedding—toolchain闭环—multi-layer verification,” formally characterizing the essential properties of scientific agents for the first time. It designs a multidimensional evaluation framework spanning hypothesis generation, experimental design, and data analysis, and integrates symbolic computation (SymPy), numerical libraries (NumPy), scholarly APIs, and simulation interfaces to enable interpretable, reproducible, and collaborative automated research workflows. Contribution/Results: Synthesizing over 100 state-of-the-art studies, the work identifies critical technical bottlenecks and ethical risks, and delivers the first high-fidelity, domain-adaptive roadmap for the systematic development of scientific agents.

Technology Category

Application Category

📝 Abstract
As scientific research becomes increasingly complex, innovative tools are needed to manage vast data, facilitate interdisciplinary collaboration, and accelerate discovery. Large language models (LLMs) are now evolving into LLM-based scientific agents that automate critical tasks, ranging from hypothesis generation and experiment design to data analysis and simulation. Unlike general-purpose LLMs, these specialized agents integrate domain-specific knowledge, advanced tool sets, and robust validation mechanisms, enabling them to handle complex data types, ensure reproducibility, and drive scientific breakthroughs. This survey provides a focused review of the architectures, design, benchmarks, applications, and ethical considerations surrounding LLM-based scientific agents. We highlight why they differ from general agents and the ways in which they advance research across various scientific fields. By examining their development and challenges, this survey offers a comprehensive roadmap for researchers and practitioners to harness these agents for more efficient, reliable, and ethically sound scientific discovery.
Problem

Research questions and friction points this paper is trying to address.

Managing vast data and accelerating scientific discovery
Automating tasks like hypothesis generation and data analysis
Ensuring reproducibility and handling complex scientific data
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based agents automate scientific tasks
Integrate domain knowledge and tools
Ensure reproducibility and handle complex data
🔎 Similar Papers
S
Shuo Ren
State Key Laboratory of Multimodal Artificial Intelligence Systems, Foundation Model Research Center, Institute of Automation, CAS, University of Chinese Academy of Science, Beijing, China.
Pu Jian
Pu Jian
China Academy of Sciences Institute of Automation
MultimodalMaching LearningNLP
Z
Zhenjiang Ren
State Key Laboratory of Multimodal Artificial Intelligence Systems, Foundation Model Research Center, Institute of Automation, CAS, University of Chinese Academy of Science, Beijing, China.
C
Chunlin Leng
State Key Laboratory of Multimodal Artificial Intelligence Systems, Foundation Model Research Center, Institute of Automation, CAS, University of Chinese Academy of Science, Beijing, China.
C
Can Xie
State Key Laboratory of Multimodal Artificial Intelligence Systems, Foundation Model Research Center, Institute of Automation, CAS, University of Chinese Academy of Science, Beijing, China.
Jiajun Zhang
Jiajun Zhang
Institute of Automation Chinese Academy of Sciences
Natural Language ProcessingLarge Language ModelsMultimodal Information Processing