🤖 AI Summary
This study addresses the uncertainty regarding which structural properties of knowledge graphs genuinely influence large language models (LLMs) in generating scientific hypotheses. By systematically perturbing graph density, ontological richness, and topological structure, the authors evaluate their impact on hypothesis generation for battery materials across multiple LLMs—Mistral-7B, Llama-3.1-70B, and Gemini 2.5 Flash. They propose the “compressed knowledge graph” hypothesis, combining top-k subgraph extraction with random and topology-aware sampling strategies, and demonstrate that critical graph signals can be effectively preserved within compact, structured subgraphs. These subgraphs often approximate the performance of the full knowledge graph, though the efficacy exhibits model dependence. This work thus reveals a promising pathway toward efficiently leveraging knowledge graphs to guide scientific discovery.
📝 Abstract
Knowledge graphs (KGs) can provide structured scientific context to language models, but it remains unclear which graph facts actually shape the generated hypotheses. We study KG-guided hypothesis generation for battery materials across Mistral-7B, Llama-3.1-70B, and Gemini 2.5 Flash. We perturb local KGs by varying density, ontology richness, topology, and control structure, and evaluate outputs with both provided-graph and fixed-reference metrics. Across models, KG utility is selective and model-dependent: graph context changes outputs, but no-KG outputs also recover substantial graph content from model priors. Compact top-k subgraphs often approximate full-KG behavior, including when claimed-outcome triples are held out. At the same time, compression is not unique to one semantic ranking rule, random and topology-based subsets can also recover much of the signal. These results support a redundancy-aware Compressive KG hypothesis: useful KG signal is often recoverable from compact, scientifically structured subgraphs rather than requiring the full local graph.