🤖 AI Summary
This study addresses critical challenges in leveraging large language models (LLMs) for scientific hypothesis generation and validation: weak interpretability, difficulty in ensuring novelty, and insufficient domain alignment. We propose the first end-to-end paradigm tailored for scientific discovery, integrating novelty-aware generation, multimodal–symbolic hybrid reasoning, human-in-the-loop validation, and ethical constraints. Technically, we unify retrieval-augmented generation, knowledge graph completion, causal inference, simulation-based modeling, tool-augmented reasoning, and domain-adaptive fine-tuning. We introduce two novel benchmarks—AHTech, a hypothesis-generation evaluation benchmark, and CSKG-600, a causal scientific knowledge graph—and establish a cross-disciplinary evaluation framework across biomedical science, materials science, environmental science, and social science. Our systematic analysis reveals fundamental trade-offs among interpretability, novelty, and domain adaptability, offering principled guidance for trustworthy AI-driven scientific discovery.
📝 Abstract
Large Language Models (LLMs) are transforming scientific hypothesis generation and validation by enabling information synthesis, latent relationship discovery, and reasoning augmentation. This survey provides a structured overview of LLM-driven approaches, including symbolic frameworks, generative models, hybrid systems, and multi-agent architectures. We examine techniques such as retrieval-augmented generation, knowledge-graph completion, simulation, causal inference, and tool-assisted reasoning, highlighting trade-offs in interpretability, novelty, and domain alignment. We contrast early symbolic discovery systems (e.g., BACON, KEKADA) with modern LLM pipelines that leverage in-context learning and domain adaptation via fine-tuning, retrieval, and symbolic grounding. For validation, we review simulation, human-AI collaboration, causal modeling, and uncertainty quantification, emphasizing iterative assessment in open-world contexts. The survey maps datasets across biomedicine, materials science, environmental science, and social science, introducing new resources like AHTech and CSKG-600. Finally, we outline a roadmap emphasizing novelty-aware generation, multimodal-symbolic integration, human-in-the-loop systems, and ethical safeguards, positioning LLMs as agents for principled, scalable scientific discovery.