🤖 AI Summary
This study investigates how topological properties of biomedical knowledge graphs (KGs) influence link prediction performance. We systematically analyze structural characteristics—including sparsity, degree distribution, clustering coefficient, and relation symmetry—across benchmark datasets (e.g., DrugBank, Hetionet), and evaluate their impact on KG completion using representative embedding models (TransE, RotatE, ComplEx) via controlled ablation studies. Our key contribution is the first empirical quantification of associations between graph-structural metrics and link prediction accuracy (measured by MRR and mean rank), achieving an R² of 0.73. We identify local clustering coefficient and relation symmetry as the most predictive topological factors. To ensure reproducibility and facilitate structural attribution analysis, we publicly release all prediction results and a dedicated analytical toolkit. This work establishes interpretable, topology-aware design principles for biomedical KG modeling, bridging structural graph theory with practical knowledge representation tasks.
📝 Abstract
Knowledge Graph Completion has been increasingly adopted as a useful method for several tasks in biomedical research, like drug repurposing or drug-target identification. To that end, a variety of datasets and Knowledge Graph Embedding models has been proposed over the years. However, little is known about the properties that render a dataset useful for a given task and, even though theoretical properties of Knowledge Graph Embedding models are well understood, their practical utility in this field remains controversial. We conduct a comprehensive investigation into the topological properties of publicly available biomedical Knowledge Graphs and establish links to the accuracy observed in real-world applications. By releasing all model predictions and a new suite of analysis tools we invite the community to build upon our work and continue improving the understanding of these crucial applications.