Spark: A System for Scientifically Creative Idea Generation

📅 2025-04-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of research ideas in scientific domains, this paper proposes the first end-to-end framework for scientific idea generation and evaluation grounded in computational creativity (CC) theory. Methodologically, it integrates a retrieval-augmented generation (RAG) module to enhance scientific relevance and introduces Judge—a discriminative review model trained on 600K real peer-review records—to assess ideas across multiple dimensions: novelty, feasibility, and alignment with expert reviews. Key contributions include: (1) the first systematic incorporation of CC principles into a closed-loop idea generation–evaluation pipeline; and (2) the open-sourcing of a large-scale, human-annotated peer-review dataset and associated model APIs. Experiments demonstrate statistically significant improvements over baselines across multiple idea-quality metrics, advancing the field toward evaluable and interpretable scientific ideation.

Technology Category

Application Category

📝 Abstract
Recently, large language models (LLMs) have shown promising abilities to generate novel research ideas in science, a direction which coincides with many foundational principles in computational creativity (CC). In light of these developments, we present an idea generation system named Spark that couples retrieval-augmented idea generation using LLMs with a reviewer model named Judge trained on 600K scientific reviews from OpenReview. Our work is both a system demonstration and intended to inspire other CC researchers to explore grounding the generation and evaluation of scientific ideas within foundational CC principles. To this end, we release the annotated dataset used to train Judge, inviting other researchers to explore the use of LLMs for idea generation and creative evaluations.
Problem

Research questions and friction points this paper is trying to address.

Develops Spark system for generating novel scientific ideas using LLMs
Integrates retrieval-augmented generation with a trained reviewer model
Encourages computational creativity research via dataset release
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-augmented LLM idea generation
Reviewer model trained on OpenReview data
Dataset release for creative evaluations
🔎 Similar Papers
No similar papers found.
A
Aishik Sanyal
Spiral Works
S
Samuel Schapiro
Univ. Illinois, Urbana-Champaign, Spiral Works
Sumuk Shashidhar
Sumuk Shashidhar
HuggingFace
large language models
R
Royce Moon
University of Michigan, Spiral Works
L
L. Varshney
Univ. Illinois, Urbana-Champaign
D
Dilek Hakkani-Tur
Univ. Illinois, Urbana-Champaign