🤖 AI Summary
This study aims to enable AI systems to autonomously formulate and validate scientifically meaningful research questions in open-ended scientific discovery—without reliance on human-specified hypotheses or directional priors.
Method: We propose a Bayesian-surprise-driven autonomous exploration framework that, for the first time, employs Bayesian surprise as a reward signal in open science. The framework integrates large language models (LLMs) for hypothesis generation, Bayesian updating for evidence evaluation, and Monte Carlo Tree Search (MCTS) with progressive expansion to efficiently navigate multi-layered, nested hypothesis spaces.
Contribution/Results: Evaluated on 21 real-world datasets under fixed computational budgets, our framework discovers 5–29% more high-surprise hypotheses than diversity- or subjectively interestingness-based baselines. Approximately 67% of the top-ranked hypotheses are rated “surprising” by domain experts—demonstrating substantial improvements in both efficiency and scientific quality of goal-free discovery.
📝 Abstract
The promise of autonomous scientific discovery (ASD) hinges not only on answering questions, but also on knowing which questions to ask. Most recent works in ASD explore the use of large language models (LLMs) in goal-driven settings, relying on human-specified research questions to guide hypothesis generation. However, scientific discovery may be accelerated further by allowing the AI system to drive exploration by its own criteria. The few existing approaches in open-ended ASD select hypotheses based on diversity heuristics or subjective proxies for human interestingness, but the former struggles to meaningfully navigate the typically vast hypothesis space, and the latter suffers from imprecise definitions. This paper presents AutoDS -- a method for open-ended ASD that instead drives scientific exploration using Bayesian surprise. Here, we quantify the epistemic shift from the LLM's prior beliefs about a hypothesis to its posterior beliefs after gathering experimental results. To efficiently explore the space of nested hypotheses, our method employs a Monte Carlo tree search (MCTS) strategy with progressive widening using surprisal as the reward function. We evaluate AutoDS in the setting of data-driven discovery across 21 real-world datasets spanning domains such as biology, economics, finance, and behavioral science. Our results demonstrate that under a fixed budget, AutoDS substantially outperforms competitors by producing 5--29% more discoveries deemed surprising by the LLM. Our human evaluation further finds that two-thirds of AutoDS discoveries are surprising to the domain experts, suggesting this is an important step forward towards building open-ended ASD systems.