AEGIS: Authentic Edge Growth In Sparsity for Link Prediction in Edge-Sparse Bipartite Knowledge Graphs

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

To address link prediction in sparse bipartite knowledge graphs within niche domains, this paper proposes AEGIS—a novel framework that avoids synthetic node generation and instead performs resampling exclusively over observed edges to preserve structural fidelity. AEGIS innovatively integrates three complementary strategies: uniform edge resampling, inverse-degree-biased resampling, and semantic KNN-based augmentation—augmented by high-rate bond percolation to simulate extreme sparsity. Crucially, it achieves performance gains without expanding the node set. Experiments on Amazon, MovieLens, and a game-design knowledge graph demonstrate that AEGIS significantly improves AUC-ROC and reduces Brier scores, indicating enhanced predictive accuracy and probability calibration. Notably, semantic KNN augmentation yields the strongest gains in text-rich scenarios.

Technology Category

Application Category

📝 Abstract

Bipartite knowledge graphs in niche domains are typically data-poor and edge-sparse, which hinders link prediction. We introduce AEGIS (Authentic Edge Growth In Sparsity), an edge-only augmentation framework that resamples existing training edges -either uniformly simple or with inverse-degree bias degree-aware -thereby preserving the original node set and sidestepping fabricated endpoints. To probe authenticity across regimes, we consider naturally sparse graphs (game design pattern's game-pattern network) and induce sparsity in denser benchmarks (Amazon, MovieLens) via high-rate bond percolation. We evaluate augmentations on two complementary metrics: AUC-ROC (higher is better) and the Brier score (lower is better), using two-tailed paired t-tests against sparse baselines. On Amazon and MovieLens, copy-based AEGIS variants match the baseline while the semantic KNN augmentation is the only method that restores AUC and calibration; random and synthetic edges remain detrimental. On the text-rich GDP graph, semantic KNN achieves the largest AUC improvement and Brier score reduction, and simple also lowers the Brier score relative to the sparse control. These findings position authenticity-constrained resampling as a data-efficient strategy for sparse bipartite link prediction, with semantic augmentation providing an additional boost when informative node descriptions are available.

Problem

Research questions and friction points this paper is trying to address.

Addressing link prediction challenges in edge-sparse bipartite knowledge graphs

Proposing edge-only augmentation to preserve nodes and avoid fabricated endpoints

Evaluating authenticity-constrained resampling for sparse bipartite link prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Edge-only augmentation resamples existing training edges

Preserves original node set to avoid fabricated endpoints

Uses semantic KNN augmentation for improved link prediction

🔎 Similar Papers

The Role of Graph Topology in the Performance of Biomedical Knowledge Graph Completion Models