MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier

📅 2026-03-04

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses the challenge of directly training models for scientific hypothesis generation, formalized as $P(\text{hypothesis}|\text{background})$, which is hindered by combinatorial complexity scaling as $O(N^k)$. To overcome this, we propose the MOOSE-Star framework, which leverages probabilistic equation decomposition, motivation-guided hierarchical search, and bounded combinatorial mechanisms to reduce complexity from exponential to logarithmic. This enables, for the first time, end-to-end trainable modeling of the hypothesis generation process. Evaluated on TOMATO-Star—a large-scale dataset of 108,717 decomposed scientific papers—MOOSE-Star breaks through the “complexity wall” inherent in conventional brute-force sampling, facilitating efficient, scalable generation of scientific discoveries and supporting continual test-time expansion.

Technology Category

Application Category

📝 Abstract

While large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning process, $P(\text{hypothesis}|\text{background})$ ($P(h|b)$), unexplored. We demonstrate that directly training $P(h|b)$ is mathematically intractable due to the combinatorial complexity ($O(N^k)$) inherent in retrieving and composing inspirations from a vast knowledge base. To break this barrier, we introduce MOOSE-Star, a unified framework enabling tractable training and scalable inference. In the best case, MOOSE-Star reduces complexity from exponential to logarithmic ($O(\log N)$) by (1) training on decomposed subtasks derived from the probabilistic equation of discovery, (2) employing motivation-guided hierarchical search to enable logarithmic retrieval and prune irrelevant subspaces, and (3) utilizing bounded composition for robustness against retrieval noise. To facilitate this, we release TOMATO-Star, a dataset of 108,717 decomposed papers (38,400 GPU hours) for training. Furthermore, we show that while brute-force sampling hits a ''complexity wall,'' MOOSE-Star exhibits continuous test-time scaling.

Problem

Research questions and friction points this paper is trying to address.

scientific discovery

generative reasoning

combinatorial complexity

tractable training

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

tractable training

generative reasoning

hierarchical search