🤖 AI Summary
To address the challenges of false negatives, uncontrollable hardness, and poor generalization in negative sampling for knowledge graph completion, this paper proposes an adaptive negative sampling framework based on difficulty-aware diffusion models. The method jointly leverages semantic and structural features to quantify entity learning difficulty, constructs a conditional diffusion model that incorporates neighborhood information and dynamic noise scheduling to generate negative samples with controllable hardness, and introduces a curriculum-based dynamic training strategy to enable progressive optimization from easy to hard instances. Compared with conventional approaches, our framework significantly mitigates the false negative problem while enhancing the discriminability and diversity of negative samples. Extensive experiments on six benchmark datasets demonstrate consistent superiority over state-of-the-art methods; notably, it achieves new state-of-the-art performance across all three core metrics on UMLS and YAGO3-10.
📝 Abstract
Negative sampling (NS) strategies play a crucial role in knowledge graph representation. In order to overcome the limitations of existing negative sampling strategies, such as vulnerability to false negatives, limited generalization, and lack of control over sample hardness, we propose DANS-KGC (Diffusion-based Adaptive Negative Sampling for Knowledge Graph Completion). DANS-KGC comprises three key components: the Difficulty Assessment Module (DAM), the Adaptive Negative Sampling Module (ANS), and the Dynamic Training Mechanism (DTM). DAM evaluates the learning difficulty of entities by integrating semantic and structural features. Based on this assessment, ANS employs a conditional diffusion model with difficulty-aware noise scheduling, leveraging semantic and neighborhood information during the denoising phase to generate negative samples of diverse hardness. DTM further enhances learning by dynamically adjusting the hardness distribution of negative samples throughout training, enabling a curriculum-style progression from easy to hard examples. Extensive experiments on six benchmark datasets demonstrate the effectiveness and generalization ability of DANS-KGC, with the method achieving state-of-the-art results on all three evaluation metrics for the UMLS and YAGO3-10 datasets.