Bridging Literature and the Universe Via A Multi-Agent Large Language Model System

📅 2025-07-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cosmological simulation parameter extraction suffers from heterogeneous literature formats, error-prone and inefficient manual conversion. To address this, we propose SimAgents—the first multi-agent large language model system tailored for astrophysics—integrating domain-specific physical reasoning, tool-augmented execution, and structured inter-agent communication to enable automated parameter extraction, cross-document consistency verification, and generation of executable simulation scripts. We introduce a novel benchmark dataset comprising over 40 real-world cosmological simulations and publicly release both the system and dataset. Experiments demonstrate that SimAgents significantly outperforms baseline methods in parameter extraction accuracy and script syntactic/semantic compliance. By bridging the gap between scientific literature and numerical simulation infrastructure, SimAgents enhances research efficiency and reproducibility in computational cosmology.

Technology Category

Application Category

📝 Abstract
As cosmological simulations and their associated software become increasingly complex, physicists face the challenge of searching through vast amounts of literature and user manuals to extract simulation parameters from dense academic papers, each using different models and formats. Translating these parameters into executable scripts remains a time-consuming and error-prone process. To improve efficiency in physics research and accelerate the cosmological simulation process, we introduce SimAgents, a multi-agent system designed to automate both parameter configuration from the literature and preliminary analysis for cosmology research. SimAgents is powered by specialized LLM agents capable of physics reasoning, simulation software validation, and tool execution. These agents collaborate through structured communication, ensuring that extracted parameters are physically meaningful, internally consistent, and software-compliant. We also construct a cosmological parameter extraction evaluation dataset by collecting over 40 simulations in published papers from Arxiv and leading journals that cover diverse simulation types. Experiments on the dataset demonstrate a strong performance of SimAgents, highlighting its effectiveness and potential to accelerate scientific research for physicists. Our demonstration video is available at: https://youtu.be/w1zLpm_CaWA. The complete system and dataset are publicly available at https://github.com/xwzhang98/SimAgents.
Problem

Research questions and friction points this paper is trying to address.

Automate extraction of simulation parameters from dense literature
Translate parameters into executable scripts efficiently
Ensure parameters are physically meaningful and software-compliant
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent LLM system automates parameter extraction
Agents ensure physics validation and software compliance
Public dataset with 40+ simulations for evaluation
X
Xiaowen Zhang
Department of Physics, Carnegie Mellon University, Pittsburgh, PA, USA
Zhenyu Bi
Zhenyu Bi
Ph.D. Student, Virginia Tech
Natural Language ProcessingInformation Retrieval
X
Xuan Wang
Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
Tiziana Di Matteo
Tiziana Di Matteo
Professor of Econophysics - King's College London
Complex systemsEconophysicsInformation filteringmathematical financecomplex networks
R
Rupert A. C. Croft
Department of Physics, Carnegie Mellon University, Pittsburgh, PA, USA