The Role of Graph Topology in the Performance of Biomedical Knowledge Graph Completion Models

📅 2024-09-06
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates how topological properties of biomedical knowledge graphs (KGs) influence link prediction performance. We systematically analyze structural characteristics—including sparsity, degree distribution, clustering coefficient, and relation symmetry—across benchmark datasets (e.g., DrugBank, Hetionet), and evaluate their impact on KG completion using representative embedding models (TransE, RotatE, ComplEx) via controlled ablation studies. Our key contribution is the first empirical quantification of associations between graph-structural metrics and link prediction accuracy (measured by MRR and mean rank), achieving an R² of 0.73. We identify local clustering coefficient and relation symmetry as the most predictive topological factors. To ensure reproducibility and facilitate structural attribution analysis, we publicly release all prediction results and a dedicated analytical toolkit. This work establishes interpretable, topology-aware design principles for biomedical KG modeling, bridging structural graph theory with practical knowledge representation tasks.

Technology Category

Application Category

📝 Abstract
Knowledge Graph Completion has been increasingly adopted as a useful method for several tasks in biomedical research, like drug repurposing or drug-target identification. To that end, a variety of datasets and Knowledge Graph Embedding models has been proposed over the years. However, little is known about the properties that render a dataset useful for a given task and, even though theoretical properties of Knowledge Graph Embedding models are well understood, their practical utility in this field remains controversial. We conduct a comprehensive investigation into the topological properties of publicly available biomedical Knowledge Graphs and establish links to the accuracy observed in real-world applications. By releasing all model predictions and a new suite of analysis tools we invite the community to build upon our work and continue improving the understanding of these crucial applications.
Problem

Research questions and friction points this paper is trying to address.

Investigates how graph topology affects biomedical knowledge graph completion performance
Examines link between topological properties and real-world task accuracy
Addresses lack of understanding about dataset properties for biomedical tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing biomedical knowledge graph topology
Linking graph properties to task accuracy
Releasing predictive models and analysis tools
A
A. Cattaneo
Graphcore Research, Bristol, UK
S
Stephen Bonner
Data Sciences and Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
T
Thomas Martynec
Data Sciences and Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
Carlo Luschi
Carlo Luschi
VP & Head of Research, Graphcore
Artificial IntelligenceNeural NetworksDeep LearningGraph Learning
I
Ian P Barrett
Data Sciences and Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
Daniel Justus
Daniel Justus
Graphcore Research, Bristol, UK