Optimal Embedding Guided Negative Sample Generation for Knowledge Graph Link Prediction

📅 2025-04-04

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

In knowledge graph link prediction, low-quality negative samples severely hinder the performance of embedding models. To address this, we propose EMU, a novel framework that theoretically derives a sufficient condition for the effectiveness of negative sample distributions. EMU shifts from conventional sampling to a *generative* negative sampling paradigm: guided by optimal embedding conditions, it actively constructs negative triples via embedding perturbations—including directional disturbance and norm scaling—that provably satisfy theoretical optimality. EMU is model-agnostic and seamlessly integrates with any knowledge graph embedding (KGE) model (e.g., TransE, ComplEx, RotatE) and existing sampling strategies, supporting end-to-end joint training. Extensive experiments on standard benchmarks (FB15k-237, WN18RR) demonstrate significant improvements in MRR and Hits@1—equivalent to increasing embedding dimensionality fivefold. The code is publicly available.

Technology Category

Application Category

📝 Abstract

Knowledge graph embedding (KGE) models encode the structural information of knowledge graphs to predicting new links. Effective training of these models requires distinguishing between positive and negative samples with high precision. Although prior research has shown that improving the quality of negative samples can significantly enhance model accuracy, identifying high-quality negative samples remains a challenging problem. This paper theoretically investigates the condition under which negative samples lead to optimal KG embedding and identifies a sufficient condition for an effective negative sample distribution. Based on this theoretical foundation, we propose extbf{E}mbedding extbf{MU}tation ( extsc{EMU}), a novel framework that emph{generates} negative samples satisfying this condition, in contrast to conventional methods that focus on emph{identifying} challenging negative samples within the training data. Importantly, the simplicity of extsc{EMU} ensures seamless integration with existing KGE models and negative sampling methods. To evaluate its efficacy, we conducted comprehensive experiments across multiple datasets. The results consistently demonstrate significant improvements in link prediction performance across various KGE models and negative sampling methods. Notably, extsc{EMU} enables performance improvements comparable to those achieved by models with embedding dimension five times larger. An implementation of the method and experiments are available at https://github.com/nec-research/EMU-KG.

Problem

Research questions and friction points this paper is trying to address.

Improving negative sample quality for knowledge graph link prediction

Theoretical conditions for optimal negative sample distribution

Generating high-quality negative samples via Embedding Mutation (EMU)

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes Embedding Mutation (EMU) framework

Generates optimal negative samples theoretically

Seamlessly integrates with existing KGE models

🔎 Similar Papers

Diversified and Adaptive Negative Sampling on Knowledge Graphs