LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs

πŸ“… 2026-05-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Traditional scientific discovery relies on static datasets, making it difficult to distinguish between mechanisms that fit local observations but generalize poorly, and lacks the capability to actively acquire informative experimental data. This work proposes the first large language model–based closed-loop framework for scientific discovery, which iteratively generates hypotheses, designs discriminative experiments, and dynamically refines mechanistic models by integrating new evidence in a hypothesis-driven, active exploration paradigm. By combining symbolic regression with graph structure learning, the method achieves symbolic accuracy rates of 67.6% on NewtonBench and 35.1% on ActiveSciBench, and recovers graph structures with 31.1% accuracy on ActiveSciBench-GRN, demonstrating a 2–5Γ— improvement in sample efficiency over existing approaches.
πŸ“ Abstract
Scientific discovery is a closed-loop process in which hypotheses guide data acquisition and observations refine the hypothesis space. Yet most approaches reduce discovery to supervised learning over fixed datasets, where limited observations can support multiple plausible mechanisms that fit locally but fail to generalize. Thus, the key challenge is selecting informative observations to resolve uncertainty, shifting the focus from static inference to adaptive data acquisition. To address this, we propose LLM-AutoSciLab, a closed-loop framework that couples hypothesis generation with hypothesis-conditioned experiment selection and mechanism refinement. Rather than fitting models to passively collected data, LLM-AutoSciLab iteratively proposes plausible hypotheses, selects informative experiments to distinguish or refine them, and updates its state using the resulting evidence. To evaluate dynamic, closed-loop scientific discovery with active data acquisition, we introduce ActiveSciBench, comprising two datasets: ActiveSciBench-Chem with 57 enzyme-kinetics tasks and ActiveSciBench-GRN with 45 gene-regulatory-network tasks. These datasets model discovery as a budget-constrained process requiring adaptive experiment design, variable selection, and recovery of true mechanisms. Across NewtonBench, ActiveSciBench-Chem, and ActiveSciBench-GRN, LLM-AutoSciLab outperforms prior methods, achieving 67.6% and 35.1% symbolic accuracy on NewtonBench and ActiveSciBench-Chem, respectively, and 31.1% exact graph recovery on ActiveSciBench-GRN. Moreover, hypothesis-guided experimentation is 2-5x more sample-efficient than the strongest competing baselines. Code and data are available at: https://github.com/scientific-discovery/LLM-AutoSciLab
Problem

Research questions and friction points this paper is trying to address.

scientific discovery
active experimentation
hypothesis refinement
adaptive data acquisition
uncertainty resolution
Innovation

Methods, ideas, or system contributions that make the work stand out.

closed-loop scientific discovery
active experimentation
hypothesis-guided inference
LLM-based reasoning
adaptive data acquisition