🤖 AI Summary
Existing LLM-based idea generation evaluation suffers from knowledge leakage, absence of realistic benchmarks, and rigid feasibility analysis. To address these issues, we introduce the first open, verifiable benchmark for AI research-oriented idea generation. Our method proposes a dual-dimensional evaluation framework—measuring both alignment with original papers and cross-literature plausibility—and provides a ground-truth dataset comprising 3,495 AI papers and their derived inspirations. Technically, we integrate semantic alignment modeling, cross-literature consistency verification, and structured feasibility scoring, thereby overcoming the limitations of prompt engineering in feasibility assessment. The framework enables quantitative, reproducible evaluation of generated ideas along three core dimensions: quality, originality, and feasibility. Empirical results demonstrate substantial improvements in automated scientific discovery capability. This work establishes a novel paradigm for AI-driven research innovation.
📝 Abstract
Large-scale Language Models (LLMs) have revolutionized human-AI interaction and achieved significant success in the generation of novel ideas. However, current assessments of idea generation overlook crucial factors such as knowledge leakage in LLMs, the absence of open-ended benchmarks with grounded truth, and the limited scope of feasibility analysis constrained by prompt design. These limitations hinder the potential of uncovering groundbreaking research ideas. In this paper, we present AI Idea Bench 2025, a framework designed to quantitatively evaluate and compare the ideas generated by LLMs within the domain of AI research from diverse perspectives. The framework comprises a comprehensive dataset of 3,495 AI papers and their associated inspired works, along with a robust evaluation methodology. This evaluation system gauges idea quality in two dimensions: alignment with the ground-truth content of the original papers and judgment based on general reference material. AI Idea Bench 2025's benchmarking system stands to be an invaluable resource for assessing and comparing idea-generation techniques, thereby facilitating the automation of scientific discovery.