A Survey on Hypothesis Generation for Scientific Discovery in the Era of Large Language Models

📅 2025-04-07

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

To address low efficiency and poor interpretability in scientific hypothesis generation under information overload and disciplinary fragmentation, this study establishes the first unified methodology taxonomy for LLM-driven hypothesis generation, distinguishing two quality-enhancement pathways: novelty boosting and structured reasoning. Methodologically, it integrates prompt engineering, chain-of-thought reasoning, reflection mechanisms, retrieval-augmented generation (RAG), and multi-dimensional evaluation metrics, while proposing novel directions in multimodal fusion and interpretable human-AI collaboration. Contributions include: (1) a comprehensive knowledge graph covering methodologies, evaluation criteria, and open challenges; and (2) a theoretically grounded yet practically viable AI-augmented scientific discovery framework that significantly improves the novelty, credibility, and reusability of generated hypotheses.

Technology Category

Application Category

📝 Abstract

Hypothesis generation is a fundamental step in scientific discovery, yet it is increasingly challenged by information overload and disciplinary fragmentation. Recent advances in Large Language Models (LLMs) have sparked growing interest in their potential to enhance and automate this process. This paper presents a comprehensive survey of hypothesis generation with LLMs by (i) reviewing existing methods, from simple prompting techniques to more complex frameworks, and proposing a taxonomy that categorizes these approaches; (ii) analyzing techniques for improving hypothesis quality, such as novelty boosting and structured reasoning; (iii) providing an overview of evaluation strategies; and (iv) discussing key challenges and future directions, including multimodal integration and human-AI collaboration. Our survey aims to serve as a reference for researchers exploring LLMs for hypothesis generation.

Problem

Research questions and friction points this paper is trying to address.

Addressing information overload in scientific hypothesis generation

Exploring LLMs' potential to automate hypothesis generation

Improving hypothesis quality via novel techniques and frameworks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Surveying LLM-based hypothesis generation methods

Improving hypothesis quality via novelty boosting

Exploring multimodal and human-AI collaboration

🔎 Similar Papers

Hypothesizing Missing Causal Variables with LLMs