Subgraph-Aware Training of Language Models for Knowledge Graph Completion Using Structure-Aware Contrastive Learning

📅 2024-07-17

📈 Citations: 0

✨ Influential: 0

career value

133K/year

🤖 AI Summary

Existing pre-trained language model (PLM)-based knowledge graph completion (KGC) methods neglect graph structural priors and the long-tailed entity distribution, resulting in poor modeling capacity for infrequent entities. To address this, we systematically encode knowledge graph topological features—including subgraphs, shortest paths, and degree distributions—as inductive biases into the PLM fine-tuning process. We further propose a subgraph-aware mini-batch sampling strategy and a structure-aware contrastive learning framework, jointly optimizing hard negative discrimination and hard positive identification. Evaluated on three mainstream KGC benchmarks, our approach significantly outperforms state-of-the-art PLM-based baselines. It effectively mitigates entity frequency imbalance, substantially improving prediction accuracy for long-tailed relations and sparse entities.

Technology Category

Application Category

📝 Abstract

Fine-tuning pre-trained language models (PLMs) has recently shown a potential to improve knowledge graph completion (KGC). However, most PLM-based methods focus solely on encoding textual information, neglecting the long-tailed nature of knowledge graphs and their various topological structures, e.g., subgraphs, shortest paths, and degrees. We claim that this is a major obstacle to achieving higher accuracy of PLMs for KGC. To this end, we propose a Subgraph-Aware Training framework for KGC (SATKGC) with two ideas: (i) subgraph-aware mini-batching to encourage hard negative sampling and to mitigate an imbalance in the frequency of entity occurrences during training, and (ii) new contrastive learning to focus more on harder in-batch negative triples and harder positive triples in terms of the structural properties of the knowledge graph. To the best of our knowledge, this is the first study to comprehensively incorporate the structural inductive bias of the knowledge graph into fine-tuning PLMs. Extensive experiments on three KGC benchmarks demonstrate the superiority of SATKGC. Our code is available.

Problem

Research questions and friction points this paper is trying to address.

Addresses neglect of structural properties in knowledge graphs

Mitigates entity frequency imbalance during PLM training

Improves hard negative and positive triple identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Subgraph-aware mini-batching for hard negative sampling

Structure-aware contrastive learning for harder triples

Incorporating structural inductive bias into PLM fine-tuning

🔎 Similar Papers

Exploring Large Language Models for Knowledge Graph Completion