The Compressive Knowledge Graph Hypothesis: Which Graph Facts Matter for Scientific Hypothesis Generation?

📅 2026-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the uncertainty regarding which structural properties of knowledge graphs genuinely influence large language models (LLMs) in generating scientific hypotheses. By systematically perturbing graph density, ontological richness, and topological structure, the authors evaluate their impact on hypothesis generation for battery materials across multiple LLMs—Mistral-7B, Llama-3.1-70B, and Gemini 2.5 Flash. They propose the “compressed knowledge graph” hypothesis, combining top-k subgraph extraction with random and topology-aware sampling strategies, and demonstrate that critical graph signals can be effectively preserved within compact, structured subgraphs. These subgraphs often approximate the performance of the full knowledge graph, though the efficacy exhibits model dependence. This work thus reveals a promising pathway toward efficiently leveraging knowledge graphs to guide scientific discovery.
📝 Abstract
Knowledge graphs (KGs) can provide structured scientific context to language models, but it remains unclear which graph facts actually shape the generated hypotheses. We study KG-guided hypothesis generation for battery materials across Mistral-7B, Llama-3.1-70B, and Gemini 2.5 Flash. We perturb local KGs by varying density, ontology richness, topology, and control structure, and evaluate outputs with both provided-graph and fixed-reference metrics. Across models, KG utility is selective and model-dependent: graph context changes outputs, but no-KG outputs also recover substantial graph content from model priors. Compact top-k subgraphs often approximate full-KG behavior, including when claimed-outcome triples are held out. At the same time, compression is not unique to one semantic ranking rule, random and topology-based subsets can also recover much of the signal. These results support a redundancy-aware Compressive KG hypothesis: useful KG signal is often recoverable from compact, scientifically structured subgraphs rather than requiring the full local graph.
Problem

Research questions and friction points this paper is trying to address.

knowledge graph
hypothesis generation
graph facts
scientific context
compressive representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compressive Knowledge Graph
hypothesis generation
knowledge graph compression
scientific reasoning
large language models
S
Shashwat Sourav
Washington University in St. Louis
V
Viktoriia Baibakova
Lila Sciences
Sanjay Das
Sanjay Das
University of Texas at Dallas
Deep learningHardware AcceleratorsHardware testing & securityFunctional safety
R
Ran Elgedawy
Oak Ridge National Laboratory
M
Maria Mahbub
Oak Ridge National Laboratory
E
Emily Herron
Oak Ridge National Laboratory
Tirthankar Ghosal
Tirthankar Ghosal
Oak Ridge National Laboratory
Natural Language ProcessingMachine LearningArtificial IntelligenceInformation Extraction