An AI system to help scientists write expert-level empirical software

📅 2025-09-08

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Scientific discovery is hindered by the inefficiency of manually developing high-quality scientific software. To address this, we propose an AI system that synergistically integrates large language models (LLMs) with heuristic tree search (TS), optimizing for quantifiable quality metrics to autonomously explore, compose, and implement cross-domain scientific methodologies—enabling end-to-end scientific software generation. The system operates without human coding intervention, dynamically invoking external tools and domain knowledge while supporting multi-step reasoning and iterative validation. Evaluated on bioinformatics and epidemiology tasks, it autonomously generated 40 novel single-cell analysis algorithms—outperforming state-of-the-art (SOTA) methods—and 14 epidemic forecasting models—achieving significantly higher accuracy than the CDC’s ensemble benchmark. This work provides the first empirical validation of AI-driven, fully automated scientific software generation, demonstrating both feasibility and technical superiority.

Technology Category

Application Category

📝 Abstract

The cycle of scientific discovery is frequently bottlenecked by the slow, manual creation of software to support computational experiments. To address this, we present an AI system that creates expert-level scientific software whose goal is to maximize a quality metric. The system uses a Large Language Model (LLM) and Tree Search (TS) to systematically improve the quality metric and intelligently navigate the large space of possible solutions. The system achieves expert-level results when it explores and integrates complex research ideas from external sources. The effectiveness of tree search is demonstrated across a wide range of benchmarks. In bioinformatics, it discovered 40 novel methods for single-cell data analysis that outperformed the top human-developed methods on a public leaderboard. In epidemiology, it generated 14 models that outperformed the CDC ensemble and all other individual models for forecasting COVID-19 hospitalizations. Our method also produced state-of-the-art software for geospatial analysis, neural activity prediction in zebrafish, time series forecasting and numerical solution of integrals. By devising and implementing novel solutions to diverse tasks, the system represents a significant step towards accelerating scientific progress.

Problem

Research questions and friction points this paper is trying to address.

Automating expert-level scientific software creation

Overcoming manual bottlenecks in computational experiments

Maximizing quality metrics via AI-driven code generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM and Tree Search for software generation

Systematically improves quality metrics

Integrates complex ideas from external sources

🔎 Similar Papers

System for systematic literature review using multiple AI agents: Concept and an empirical evaluation