LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context

📅 2024-12-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates large language models’ (LLMs) scientific creativity under information-minimal conditions—decoupling scientific creativity from general intelligence. Method: We introduce the first lightweight-context scientific creativity benchmark, eliciting original ideas from single keywords and evaluating them along four dimensions: originality, feasibility, fluency, and flexibility. Grounded in Guilford’s structure-of-intellect theory, we propose a dynamic LLM expert-panel scoring mechanism and multidimensional quantitative creativity metrics. The benchmark comprises 1,180 keywords spanning 18 scientific disciplines. Contribution/Results: Experiments across 20 mainstream LLMs reveal a nonlinear distribution of creative capability; notably, QwQ-32B matches o1-preview in divergent thinking. Results demonstrate that minimal prompting suffices to effectively elicit and quantify LLM-driven scientific creativity, establishing a scalable, theory-grounded framework for creativity assessment.

Technology Category

Application Category

📝 Abstract
While Large Language Models (LLMs) have demonstrated remarkable capabilities in scientific tasks, existing evaluation frameworks primarily assess their performance using rich contextual inputs, overlooking their ability to generate novel ideas from minimal information. We introduce LiveIdeaBench, a comprehensive benchmark that evaluates LLMs' scientific creativity and divergent thinking capabilities using single-keyword prompts. Drawing from Guilford's creativity theory, our framework employs a dynamic panel of state-of-the-art LLMs to assess generated ideas across four key dimensions: originality, feasibility, fluency, and flexibility. Through extensive experimentation with 20 leading models across 1,180 keywords spanning 18 scientific domains, we reveal that scientific creative ability shows distinct patterns from general intelligence metrics. Notably, our results demonstrate that models like QwQ-32B-preview achieve comparable creative performance to top-tier models like o1-preview, despite significant gaps in their general intelligence scores. These findings highlight the importance of specialized evaluation frameworks for scientific creativity and suggest that the development of creative capabilities in LLMs may follow different trajectories than traditional problem-solving abilities.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Scientific Innovation
Creativity vs General Intelligence
Innovation

Methods, ideas, or system contributions that make the work stand out.

LiveIdeaBench
Creativity Assessment
Intelligent Systems
🔎 Similar Papers
No similar papers found.
Kai Ruan
Kai Ruan
Gaoling School of Artificial Intelligence, Renmin University of China
AI for ScienceSymbolic regression
X
Xuan Wang
ZJU-UIUC Institute, Zhejiang University, Haining, China
J
Jixiang Hong
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China
H
Hao Sun
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China