🤖 AI Summary
This work addresses the limitation of current large language models in scientific ideation, which often lack deep reasoning from research motivation to methodology, resulting in superficial proposals with insufficient technical depth. To overcome this, the authors propose the MoRI framework, which explicitly models scientific motivation as the starting point of reasoning. MoRI is initialized via supervised fine-tuning and further enhanced through a composite reinforcement learning strategy that integrates entropy-aware information gain and contrastive semantic gain to guide the generation of scientifically rigorous and conceptually coherent proposals. Experimental results demonstrate that MoRI significantly outperforms leading commercial large language models and sophisticated agent-based baselines in terms of novelty, technical rigor, and feasibility.
📝 Abstract
Scientific ideation aims to propose novel solutions within a given scientific context. Existing LLM-based agentic approaches emulate human research workflows, yet inadequately model scientific reasoning, resulting in surface-level conceptual recombinations that lack technical depth and scientific grounding. To address this issue, we propose \textbf{MoRI} (\textbf{Mo}tivation-grounded \textbf{R}easoning for Scientific \textbf{I}deation), a framework that enables LLMs to explicitly learn the reasoning process from research motivations to methodologies. The base LLM is initialized via supervised fine-tuning to generate a research motivation from a given context, and is subsequently trained under a composite reinforcement learning reward that approximates scientific rigor: (1) entropy-aware information gain encourages the model to uncover and elaborate high-complexity technical details grounded in ground-truth methodologies, and (2) contrastive semantic gain constrains the reasoning trajectory to maintain conceptually aligned with scientifically valid solutions. Empirical results show that MoRI significantly outperforms strong commercial LLMs and complex agentic baselines across multiple dimensions, including novelty, technical rigor, and feasibility. The code will be made available on \href{https://github.com/ECNU-Text-Computing/IdeaGeneration}{GitHub}.