๐ค AI Summary
This work addresses the challenge of efficient and accurate link prediction on the ultra-large-scale knowledge graph WikiKG90Mv2 by proposing a two-stage โretrieve-and-rerankโ framework. The approach first employs a priority-aware retrieval mechanism to rapidly identify structurally and semantically similar candidate entities. Subsequently, it introduces a reranking model that leverages neighbor-enhanced entity representations and ensemble learning to refine prediction scores. This framework achieves a significant improvement in prediction accuracy while maintaining high computational efficiency, boosting the Mean Reciprocal Rank (MRR) from 0.2342 to 0.2839 on the WikiKG90Mv2 validation set. The results outperform existing baselines, demonstrating the effectiveness and state-of-the-art performance of the proposed method for large-scale knowledge graph embedding tasks.
๐ Abstract
WikiKG90Mv2 in NeurIPS 2022 is a large encyclopedic knowledge graph. Embedding knowledge graphs into continuous vector spaces is important for many practical applications, such as knowledge acquisition, question answering, and recommendation systems. Compared to existing knowledge graphs, WikiKG90Mv2 is a large scale knowledge graph, which is composed of more than 90 millions of entities. Both efficiency and accuracy should be considered when building graph embedding models for knowledge graph at scale. To this end, we follow the retrieve then re-rank pipeline, and make novel modifications in both retrieval and re-ranking stage. Specifically, we propose a priority infilling retrieval model to obtain candidates that are structurally and semantically similar. Then we propose an ensemble based re-ranking model with neighbor enhanced representations to produce final link prediction results among retrieved candidates. Experimental results show that our proposed method outperforms existing baseline methods and improves MRR of validation set from 0.2342 to 0.2839.