Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents

📅 2025-03-31

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

Scientific discovery faces persistent challenges—including data complexity, interdisciplinary collaboration barriers, and insufficient reproducibility—limiting the efficacy of conventional AI agents in research contexts. Method: This paper establishes a domain-specific LLM-based agent paradigm for scientific tasks, proposing a novel tripartite architecture: “domain-knowledge embedding—toolchain闭环—multi-layer verification,” formally characterizing the essential properties of scientific agents for the first time. It designs a multidimensional evaluation framework spanning hypothesis generation, experimental design, and data analysis, and integrates symbolic computation (SymPy), numerical libraries (NumPy), scholarly APIs, and simulation interfaces to enable interpretable, reproducible, and collaborative automated research workflows. Contribution/Results: Synthesizing over 100 state-of-the-art studies, the work identifies critical technical bottlenecks and ethical risks, and delivers the first high-fidelity, domain-adaptive roadmap for the systematic development of scientific agents.

Technology Category

Application Category

📝 Abstract

As scientific research becomes increasingly complex, innovative tools are needed to manage vast data, facilitate interdisciplinary collaboration, and accelerate discovery. Large language models (LLMs) are now evolving into LLM-based scientific agents that automate critical tasks, ranging from hypothesis generation and experiment design to data analysis and simulation. Unlike general-purpose LLMs, these specialized agents integrate domain-specific knowledge, advanced tool sets, and robust validation mechanisms, enabling them to handle complex data types, ensure reproducibility, and drive scientific breakthroughs. This survey provides a focused review of the architectures, design, benchmarks, applications, and ethical considerations surrounding LLM-based scientific agents. We highlight why they differ from general agents and the ways in which they advance research across various scientific fields. By examining their development and challenges, this survey offers a comprehensive roadmap for researchers and practitioners to harness these agents for more efficient, reliable, and ethically sound scientific discovery.

Problem

Research questions and friction points this paper is trying to address.

Managing vast data and accelerating scientific discovery

Automating tasks like hypothesis generation and data analysis

Ensuring reproducibility and handling complex scientific data

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based agents automate scientific tasks

Integrate domain knowledge and tools

Ensure reproducibility and handle complex data

🔎 Similar Papers

Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science