Generating Literature-Driven Scientific Theories at Scale

📅 2026-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in current automated scientific discovery systems, which predominantly focus on experiment generation while neglecting higher-level cognitive tasks such as theory construction. To bridge this gap, the authors propose the first large-scale, literature-driven framework for automatic scientific theory generation, integrating large language models with external literature retrieval. The study systematically compares two strategies—parameterized memory versus literature-grounded synthesis—in constructing scientific theories. Leveraging a corpus of 13.7k research papers, the framework generates 2.9k candidate theories. Experimental results demonstrate that the literature-grounded approach significantly outperforms purely parameterized methods in aligning with existing evidence and accurately predicting findings reported in 4.6k subsequent publications, thereby validating its superior balance of accuracy and novelty.

Technology Category

Application Category

📝 Abstract
Contemporary automated scientific discovery has focused on agents for generating scientific experiments, while systems that perform higher-level scientific activities such as theory building remain underexplored. In this work, we formulate the problem of synthesizing theories consisting of qualitative and quantitative laws from large corpora of scientific literature. We study theory generation at scale, using 13.7k source papers to synthesize 2.9k theories, examining how generation using literature-grounding versus parametric knowledge, and accuracy-focused versus novelty-focused generation objectives change theory properties. Our experiments show that, compared to using parametric LLM memory for generation, our literature-supported method creates theories that are significantly better at both matching existing evidence and at predicting future results from 4.6k subsequently-written papers
Problem

Research questions and friction points this paper is trying to address.

scientific theory generation
literature-based synthesis
automated scientific discovery
large language models
knowledge grounding
Innovation

Methods, ideas, or system contributions that make the work stand out.

literature-grounded theory generation
automated scientific discovery
large-scale theory synthesis
LLM-based scientific reasoning
evidence-based prediction
🔎 Similar Papers
No similar papers found.
P
Peter Jansen
Allen Institute for Artificial Intelligence, University of Arizona
Peter Clark
Peter Clark
Allen Institute for Artificial Intelligence (AI2)
Artificial Intelligence
Doug Downey
Doug Downey
Allen Institute for AI and Northwestern University
Natural Language ProcessingMachine LearningArtificial Intelligence
D
Daniel S. Weld
Allen Institute for Artificial Intelligence, University of Washington