HARPA: A Testability-Driven, Literature-Grounded Framework for Research Ideation

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Current automated scientific discovery (ASD) tools often produce hypotheses lacking testability, insufficient grounding in prior literature, and limited capacity for adaptive refinement based on experimental feedback. To address these limitations, we propose an AI-driven hypothesis generation framework that emulates human researchers’ reasoning—integrating literature mining, research gap identification, and exploration of the hypothesis design space, augmented by a novel reward modeling and dynamic learning mechanism conditioned on empirical results. Our approach significantly improves hypothesis feasibility (+0.78 on Likert-scale evaluation) and literature grounding (+0.85). Integrated with CodeScientist, it achieves a 50% execution rate (20/40), substantially outperforming baselines. The core contribution is the first closed-loop, feedback-driven paradigm for generating empirically testable hypotheses—jointly enhancing scientific creativity, verifiability, and knowledge groundedness.

Technology Category

Application Category

📝 Abstract

While there has been a surge of interest in automated scientific discovery (ASD), especially with the emergence of LLMs, it remains challenging for tools to generate hypotheses that are both testable and grounded in the scientific literature. Additionally, existing ideation tools are not adaptive to prior experimental outcomes. We developed HARPA to address these challenges by incorporating the ideation workflow inspired by human researchers. HARPA first identifies emerging research trends through literature mining, then explores hypothesis design spaces, and finally converges on precise, testable hypotheses by pinpointing research gaps and justifying design choices. Our evaluations show that HARPA-generated hypothesis-driven research proposals perform comparably to a strong baseline AI-researcher across most qualitative dimensions (e.g., specificity, novelty, overall quality), but achieve significant gains in feasibility(+0.78, p$<0.05$, bootstrap) and groundedness (+0.85, p$<0.01$, bootstrap) on a 10-point Likert scale. When tested with the ASD agent (CodeScientist), HARPA produced more successful executions (20 vs. 11 out of 40) and fewer failures (16 vs. 21 out of 40), showing that expert feasibility judgments track with actual execution success. Furthermore, to simulate how researchers continuously refine their understanding of what hypotheses are both testable and potentially interesting from experience, HARPA learns a reward model that scores new hypotheses based on prior experimental outcomes, achieving approx. a 28% absolute gain over HARPA's untrained baseline scorer. Together, these methods represent a step forward in the field of AI-driven scientific discovery.

Problem

Research questions and friction points this paper is trying to address.

Generating testable hypotheses grounded in scientific literature

Creating adaptive research tools responsive to experimental outcomes

Identifying research gaps and justifying hypothesis design choices

Innovation

Methods, ideas, or system contributions that make the work stand out.

Literature mining identifies emerging research trends

Hypothesis design space exploration for testable proposals

Learning reward model from prior experimental outcomes

🔎 Similar Papers

IdeaSynth: Iterative Research Idea Development Through Evolving and Composing Idea Facets with Literature-Grounded Feedback