MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

To address the limited hypothesis validation efficiency caused by high cost and low throughput of wet-lab experiments, this paper proposes an experiment-guided hypothesis ranking paradigm. We formally define this task for the first time; develop an interpretable *in silico* hypothesis simulator that integrates domain knowledge with noise-aware modeling; and design a dynamic ranking method leveraging functional clustering and simulation-based feedback. Evaluated on a real-world chemical hypothesis dataset comprising 124 hypotheses, our approach significantly outperforms baselines relying solely on internal reasoning of large language models. Ablation studies confirm the individual contributions of each component. Results demonstrate that the simulation feedback mechanism effectively bridges the gap between theoretical inference and empirical constraints, achieving both strong generalizability and interpretability. This work establishes a novel pathway for data–experiment co-driven scientific discovery.

Technology Category

Application Category

📝 Abstract

Hypothesis ranking is a crucial component of automated scientific discovery, particularly in natural sciences where wet-lab experiments are costly and throughput-limited. Existing approaches focus on pre-experiment ranking, relying solely on large language model's internal reasoning without incorporating empirical outcomes from experiments. We introduce the task of experiment-guided ranking, which aims to prioritize candidate hypotheses based on the results of previously tested ones. However, developing such strategies is challenging due to the impracticality of repeatedly conducting real experiments in natural science domains. To address this, we propose a simulator grounded in three domain-informed assumptions, modeling hypothesis performance as a function of similarity to a known ground truth hypothesis, perturbed by noise. We curate a dataset of 124 chemistry hypotheses with experimentally reported outcomes to validate the simulator. Building on this simulator, we develop a pseudo experiment-guided ranking method that clusters hypotheses by shared functional characteristics and prioritizes candidates based on insights derived from simulated experimental feedback. Experiments show that our method outperforms pre-experiment baselines and strong ablations.

Problem

Research questions and friction points this paper is trying to address.

Automating hypothesis ranking for costly wet-lab experiments

Incorporating experimental feedback to improve hypothesis prioritization

Simulating experiments to validate ranking strategies without real tests

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulator models hypothesis performance with noise

Clusters hypotheses by shared functional characteristics

Prioritizes candidates using simulated experimental feedback

🔎 Similar Papers

MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses