Enhancing PyKEEN with Multiple Negative Sampling Solutions for Knowledge Graph Embedding Models

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Knowledge graph embedding (KGE) training is often hindered by the scarcity of high-quality negative samples, while existing toolkits support only basic negative sampling strategies. To address this limitation, we extend the PyKEEN framework with a modular negative sampling architecture that integrates diverse static and dynamic methods—including policy-driven triple corruption—ensuring compatibility, extensibility, and customization flexibility. Our approach significantly enhances negative sample relevance and training efficacy. Comprehensive experiments on standard link prediction benchmarks demonstrate consistent performance gains across multiple KGE models (e.g., TransE, RotatE, ComplEx), yielding average MRR improvements of 3.2–7.8%. The proposed solution provides a reproducible, production-ready framework for efficient and robust KGE training.

Technology Category

Application Category

📝 Abstract
Embedding methods have become popular due to their scalability on link prediction and/or triple classification tasks on Knowledge Graphs. Embedding models are trained relying on both positive and negative samples of triples. However, in the absence of negative assertions, these must be usually artificially generated using various negative sampling strategies, ranging from random corruption to more sophisticated techniques which have an impact on the overall performance. Most of the popular libraries for knowledge graph embedding, support only basic such strategies and lack advanced solutions. To address this gap, we deliver an extension for the popular KGE framework PyKEEN that integrates a suite of several advanced negative samplers (including both static and dynamic corruption strategies), within a consistent modular architecture, to generate meaningful negative samples, while remaining compatible with existing PyKEEN -based workflows and pipelines. The developed extension not only enhancesPyKEEN itself but also allows for easier and comprehensive development of embedding methods and/or for their customization. As a proof of concept, we present a comprehensive empirical study of the developed extensions and their impact on the performance (link prediction tasks) of different embedding methods, which also provides useful insights for the design of more effective strategies
Problem

Research questions and friction points this paper is trying to address.

Enhancing negative sampling for knowledge graph embedding models
Integrating advanced negative samplers into PyKEEN framework
Improving link prediction performance with dynamic corruption strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends PyKEEN with advanced negative samplers
Supports both static and dynamic corruption strategies
Ensures compatibility with existing PyKEEN workflows