🤖 AI Summary
This work addresses a critical limitation in current automated scientific discovery systems, which predominantly focus on experiment generation while neglecting higher-level cognitive tasks such as theory construction. To bridge this gap, the authors propose the first large-scale, literature-driven framework for automatic scientific theory generation, integrating large language models with external literature retrieval. The study systematically compares two strategies—parameterized memory versus literature-grounded synthesis—in constructing scientific theories. Leveraging a corpus of 13.7k research papers, the framework generates 2.9k candidate theories. Experimental results demonstrate that the literature-grounded approach significantly outperforms purely parameterized methods in aligning with existing evidence and accurately predicting findings reported in 4.6k subsequent publications, thereby validating its superior balance of accuracy and novelty.
📝 Abstract
Contemporary automated scientific discovery has focused on agents for generating scientific experiments, while systems that perform higher-level scientific activities such as theory building remain underexplored. In this work, we formulate the problem of synthesizing theories consisting of qualitative and quantitative laws from large corpora of scientific literature. We study theory generation at scale, using 13.7k source papers to synthesize 2.9k theories, examining how generation using literature-grounding versus parametric knowledge, and accuracy-focused versus novelty-focused generation objectives change theory properties. Our experiments show that, compared to using parametric LLM memory for generation, our literature-supported method creates theories that are significantly better at both matching existing evidence and at predicting future results from 4.6k subsequently-written papers