Enhancing PyKEEN with Multiple Negative Sampling Solutions for Knowledge Graph Embedding Models

📅 2025-08-07

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Knowledge graph embedding (KGE) training is often hindered by the scarcity of high-quality negative samples, while existing toolkits support only basic negative sampling strategies. To address this limitation, we extend the PyKEEN framework with a modular negative sampling architecture that integrates diverse static and dynamic methods—including policy-driven triple corruption—ensuring compatibility, extensibility, and customization flexibility. Our approach significantly enhances negative sample relevance and training efficacy. Comprehensive experiments on standard link prediction benchmarks demonstrate consistent performance gains across multiple KGE models (e.g., TransE, RotatE, ComplEx), yielding average MRR improvements of 3.2–7.8%. The proposed solution provides a reproducible, production-ready framework for efficient and robust KGE training.

Technology Category

Application Category

📝 Abstract

Embedding methods have become popular due to their scalability on link prediction and/or triple classification tasks on Knowledge Graphs. Embedding models are trained relying on both positive and negative samples of triples. However, in the absence of negative assertions, these must be usually artificially generated using various negative sampling strategies, ranging from random corruption to more sophisticated techniques which have an impact on the overall performance. Most of the popular libraries for knowledge graph embedding, support only basic such strategies and lack advanced solutions. To address this gap, we deliver an extension for the popular KGE framework PyKEEN that integrates a suite of several advanced negative samplers (including both static and dynamic corruption strategies), within a consistent modular architecture, to generate meaningful negative samples, while remaining compatible with existing PyKEEN -based workflows and pipelines. The developed extension not only enhancesPyKEEN itself but also allows for easier and comprehensive development of embedding methods and/or for their customization. As a proof of concept, we present a comprehensive empirical study of the developed extensions and their impact on the performance (link prediction tasks) of different embedding methods, which also provides useful insights for the design of more effective strategies

Problem

Research questions and friction points this paper is trying to address.

Enhancing negative sampling for knowledge graph embedding models

Integrating advanced negative samplers into PyKEEN framework

Improving link prediction performance with dynamic corruption strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends PyKEEN with advanced negative samplers

Supports both static and dynamic corruption strategies

Ensures compatibility with existing PyKEEN workflows

🔎 Similar Papers

The Role of Graph Topology in the Performance of Biomedical Knowledge Graph Completion Models