🤖 AI Summary
Existing link prediction methods for hyper-relational knowledge graphs are limited to scenarios with a single missing element and struggle to handle complex cases involving multiple or even fully missing components of a fact. This work introduces the task of "fact generation," pioneering generative representation learning in hyper-relational knowledge graphs. By leveraging a masked discrete diffusion mechanism, the proposed approach models both intra-fact dependencies and global structural context, enabling fact completion under arbitrary masking patterns as well as zero-shot fact generation. The method achieves state-of-the-art performance on standard link prediction benchmarks and generates facts that are not only more accurate but also more novel than those produced by large language models.
📝 Abstract
Hyper-relational knowledge graphs (HKGs) effectively represent complex facts. While inferring new knowledge in HKGs is a critical problem, current methods cast it as a simple link prediction, assuming that nearly all entities and relations within a fact are known, leaving only a single blank to be filled. However, this restricted assumption may not hold in real-world scenarios in which multiple, or even all, constituent components of a fact may be missing simultaneously. To bridge this gap, we introduce a task called fact generation: generating a valid hyper-relational fact from an arbitrarily masked query, i.e., completing a partially observed fact or generating a fact from scratch. We propose KREPE, the first generative representation learning method for HKGs that learns to model the probability distributions of missing components conditioned on the local fact components and global structure of HKGs via a masked discrete diffusion. KREPE models both the intra-fact dependencies by contextual message passing and inter-fact correlations by aggregating stochastically sampled contexts. KREPE seamlessly unifies link prediction and fact generation within a single training framework, achieving state-of-the-art performance on standard HKG link prediction benchmarks and outperforming LLM-based baselines in generating novel and correct facts.