Continuous Knowledge Metabolism: Generating Scientific Hypotheses from Evolving Literature

πŸ“… 2026-04-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

202K/year
πŸ€– AI Summary
Scientific hypothesis generation requires modeling the dynamic evolution of knowledge rather than relying on static snapshots of the literature. This work proposes a Continuous Knowledge Metabolism (CKM) framework that incrementally processes scientific publications through a sliding time window to construct a structured knowledge base, enabling hypothesis generation grounded in trajectories of knowledge evolutionβ€”such as novelty, corroboration, and contradiction. Experimental results demonstrate that the lightweight variant, CKM-Lite, significantly outperforms batch-processing baselines in hypothesis hit rate, output volume, and alignment quality while reducing token consumption by 92%. The full-fledged CKM-Full further uncovers a critical trade-off between hypothesis quality and coverage and highlights the pivotal role of domain stability in determining hypothesis success.

Technology Category

Application Category

πŸ“ Abstract
Scientific hypothesis generation requires tracking how knowledge evolves, not just what is currently known. We introduce Continuous Knowledge Metabolism (CKM), a framework that processes scientific literature through sliding time windows and incrementally updates a structured knowledge base as new findings arrive. We present CKM-Lite, an efficient variant that achieves strong predictive coverage through incremental accumulation, outperforming batch processing on hit rate (+2.8%, p=0.006), hypothesis yield (+3.6, p<0.001), and best-match alignment (+0.43, p<0.001) while reducing token cost by 92%. To understand what drives these differences, we develop CKM-Full, an instrumented variant that categorizes each new finding as novel, confirming, or contradicting, detects knowledge change signals, and conditions hypothesis generation on the full evolution trajectory. Analyzing 892 hypotheses generated by CKM-Full across 50 research topics, alongside parallel runs of the other variants, we report four empirical observations: (1) incremental processing outperforms batch baseline across predictive and efficiency metrics; (2) change-aware instrumentation is associated with higher LLM-judged novelty (Cohen's d=3.46) but lower predictive coverage, revealing a quality-coverage trade-off; (3) a field's trajectory stability is associated with hypothesis success (r=-0.28, p=0.051), suggesting boundary conditions for literature-based prediction; (4) knowledge convergence signals are associated with nearly 5x higher hit rate than contradiction signals, pointing to differential predictability across change types. These findings suggest that the character of generated hypotheses is shaped not only by how much literature is processed, but also by how it is processed. They further indicate that evaluation frameworks must account for the quality-coverage trade-off rather than optimize for a single metric.
Problem

Research questions and friction points this paper is trying to address.

scientific hypothesis generation
knowledge evolution
literature-based discovery
incremental processing
predictive coverage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous Knowledge Metabolism
incremental knowledge updating
hypothesis generation
change-aware reasoning
scientific literature evolution