Improving Research Idea Generation Through Data: An Empirical Investigation in Social Science

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This study addresses the low feasibility and weak empirical grounding of social science research ideas generated by large language models (LLMs). Focusing on climate negotiations, it proposes a two-stage framework integrating metadata guidance and automated empirical validation. Methodologically: (1) domain-specific metadata—such as negotiating parties’ positions and treaty enforceability—is injected to enhance the practical feasibility of generated ideas; (2) an automated hypothesis validation module, grounded in knowledge graphs and empirical literature, assesses causal logic and evidentiary support; (3) human-in-the-loop evaluation corroborates validation outcomes. The key contribution lies in the first integration of structured metadata guidance with interpretable, automated validation within the research ideation pipeline. Experiments demonstrate that metadata guidance improves idea feasibility by 20%, while automated validation enhances the quality of selected ideas by 7%. Moreover, the framework significantly strengthens researchers’ capacity for original topic formulation, advancing LLMs from generic idea generation toward verifiable, actionable academic conception.

Technology Category

Application Category

📝 Abstract

Recent advancements in large language models (LLMs) have shown promise in generating novel research ideas. However, these ideas often face challenges related to feasibility and expected effectiveness. This paper explores how augmenting LLMs with relevant data during the idea generation process can enhance the quality of generated ideas. We introduce two ways of incorporating data: (1) providing metadata during the idea generation stage to guide LLMs toward feasible directions, and (2) adding automatic validation during the idea selection stage to assess the empirical plausibility of hypotheses within ideas. We conduct experiments in the social science domain, specifically with climate negotiation topics, and find that metadata improves the feasibility of generated ideas by 20%, while automatic validation improves the overall quality of selected ideas by 7%. A human study shows that LLM-generated ideas, along with their related data and validation processes, inspire researchers to propose research ideas with higher quality. Our work highlights the potential of data-driven research idea generation, and underscores the practical utility of LLM-assisted ideation in real-world academic settings.

Problem

Research questions and friction points this paper is trying to address.

Enhancing research idea feasibility using LLMs and metadata

Improving idea quality via automatic validation in selection

Assessing data-driven LLM ideation in social science research

Innovation

Methods, ideas, or system contributions that make the work stand out.

Augmenting LLMs with metadata for feasibility

Adding automatic validation for empirical plausibility

Data-driven idea generation in social science

🔎 Similar Papers

Interesting Scientific Idea Generation using Knowledge Graphs and LLMs: Evaluations with 100 Research Group Leaders