An AI system to help scientists write expert-level empirical software

📅 2025-09-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Scientific discovery is hindered by the inefficiency of manually developing high-quality scientific software. To address this, we propose an AI system that synergistically integrates large language models (LLMs) with heuristic tree search (TS), optimizing for quantifiable quality metrics to autonomously explore, compose, and implement cross-domain scientific methodologies—enabling end-to-end scientific software generation. The system operates without human coding intervention, dynamically invoking external tools and domain knowledge while supporting multi-step reasoning and iterative validation. Evaluated on bioinformatics and epidemiology tasks, it autonomously generated 40 novel single-cell analysis algorithms—outperforming state-of-the-art (SOTA) methods—and 14 epidemic forecasting models—achieving significantly higher accuracy than the CDC’s ensemble benchmark. This work provides the first empirical validation of AI-driven, fully automated scientific software generation, demonstrating both feasibility and technical superiority.

Technology Category

Application Category

📝 Abstract
The cycle of scientific discovery is frequently bottlenecked by the slow, manual creation of software to support computational experiments. To address this, we present an AI system that creates expert-level scientific software whose goal is to maximize a quality metric. The system uses a Large Language Model (LLM) and Tree Search (TS) to systematically improve the quality metric and intelligently navigate the large space of possible solutions. The system achieves expert-level results when it explores and integrates complex research ideas from external sources. The effectiveness of tree search is demonstrated across a wide range of benchmarks. In bioinformatics, it discovered 40 novel methods for single-cell data analysis that outperformed the top human-developed methods on a public leaderboard. In epidemiology, it generated 14 models that outperformed the CDC ensemble and all other individual models for forecasting COVID-19 hospitalizations. Our method also produced state-of-the-art software for geospatial analysis, neural activity prediction in zebrafish, time series forecasting and numerical solution of integrals. By devising and implementing novel solutions to diverse tasks, the system represents a significant step towards accelerating scientific progress.
Problem

Research questions and friction points this paper is trying to address.

Automating expert-level scientific software creation
Overcoming manual bottlenecks in computational experiments
Maximizing quality metrics via AI-driven code generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM and Tree Search for software generation
Systematically improves quality metrics
Integrates complex ideas from external sources
🔎 Similar Papers
No similar papers found.
Eser Aygün
Eser Aygün
Google DeepMind
Pattern RecognitionComplex SystemsReinforcement LearningAutomated Theorem Proving
Anastasiya Belyaeva
Anastasiya Belyaeva
Google Research
Machine learningComputational genomics
Gheorghe Comanici
Gheorghe Comanici
Research Scientist, Google DeepMind
LLMs for ScienceReinforcement LearningHierarchical BehaviorBisimulation metrics
M
Marc Coram
Google Research
Hao Cui
Hao Cui
University of California, Irvine
privacy policyimage watermarking
J
Jake Garrison
Google Platforms and Devices
R
Renee Johnston
Google Research
A
Anton Kast
Google Research
C
Cory Y. McLean
Google Research
P
Peter Norgaard
Google Research
Z
Zahra Shamsi
Google Research
D
David Smalling
Google DeepMind
James Thompson
James Thompson
Google Research
Subhashini Venugopalan
Subhashini Venugopalan
University of Texas at Austin
Natural Language ProcessingComputer VisionMachine Learning
B
Brian P. Williams
Google Research
C
Chujun He
Google Research, Massachusetts Institute of Technology
S
Sarah Martinson
Google Research, School of Engineering and Applied Sciences, Harvard University
M
Martyna Plomecka
Google Research, Google Cloud
L
Lai Wei
Google Research
Y
Yuchen Zhou
Google Research
Q
Qian-Ze Zhu
Google Research, School of Engineering and Applied Sciences, Harvard University
M
Matthew Abraham
Google Research
E
Erica Brand
Google Research
A
Anna Bulanova
Google DeepMind
J
Jeffrey A. Cardille
Google Research, Faculty of Agricultural and Environmental Sciences, McGill University
C
Chris Co
Google Research