🤖 AI Summary
The exponential growth of scientific literature has intensified challenges in cross-domain knowledge integration and hypothesis generation. Method: This study systematically reviews the evolution of literature-based discovery (LBD) paradigms since 2000 and proposes, for the first time, a large language model (LLM)-driven end-to-end LBD framework that eliminates reliance on structured databases and manual annotations. The approach integrates knowledge graph construction, deep learning, and multi-stage collaborative LLM reasoning. Contribution/Results: The framework significantly improves coverage and interpretability in detecting cross-domain latent associations. Empirical validation in biomedical domains demonstrates its effectiveness and reusability for hypothesis generation. By enabling scalable, interpretable, and domain-agnostic discovery from unstructured text, it establishes a novel paradigm and practical pathway for literature-driven scientific discovery.
📝 Abstract
The explosive growth of scientific publications has created an urgent need for automated methods that facilitate knowledge synthesis and hypothesis generation. Literature-based discovery (LBD) addresses this challenge by uncovering previously unknown associations between disparate domains. This article surveys recent methodological advances in LBD, focusing on developments from 2000 to the present. We review progress in three key areas: knowledge graph construction, deep learning approaches, and the integration of pre-trained and large language models (LLMs). While LBD has made notable progress, several fundamental challenges remain unresolved, particularly concerning scalability, reliance on structured data, and the need for extensive manual curation. By examining ongoing advances and outlining promising future directions, this survey underscores the transformative role of LLMs in enhancing LBD and aims to support researchers and practitioners in harnessing these technologies to accelerate scientific innovation.