🤖 AI Summary
This work addresses the challenge of knowledge graph generation by modeling global semantic dependencies among triples while respecting domain-specific constraints—tasks at which conventional link prediction methods often fail due to their inability to ensure subgraph-level consistency. The authors propose ARK, an autoregressive knowledge graph generation framework, along with its variational extension SAIL, which treats a knowledge graph as a sequence of (head, relation, tail) triples. By leveraging RNN or Transformer architectures, the models implicitly learn semantic constraints such as type consistency, temporal validity, and relational patterns, enabling both unconditional and conditional generation without explicit rule-based supervision. Evaluated on the IntelliGraphs benchmark, the generated graphs achieve semantic validity rates of 89.2%–100.0% and exhibit novel structures absent from training data. Experiments further reveal that latent dimensions of at least 64 are more critical than model depth, and that RNNs offer significantly improved efficiency while maintaining competitive performance.
📝 Abstract
Knowledge Graph (KG) generation requires models to learn complex semantic dependencies between triples while maintaining domain validity constraints. Unlike link prediction, which scores triples independently, generative models must capture interdependencies across entire subgraphs to produce semantically coherent structures. We present ARK (Auto-Regressive Knowledge Graph Generation), a family of autoregressive models that generate KGs by treating graphs as sequences of (head, relation, tail) triples. ARK learns implicit semantic constraints directly from data, including type consistency, temporal validity, and relational patterns, without explicit rule supervision. On the IntelliGraphs benchmark, our models achieve 89.2% to 100.0% semantic validity across diverse datasets while generating novel graphs not seen during training. We also introduce SAIL, a variational extension of ARK that enables controlled generation through learned latent representations, supporting both unconditional sampling and conditional completion from partial graphs. Our analysis reveals that model capacity (hidden dimensionality>= 64) is more critical than architectural depth for KG generation, with recurrent architectures achieving comparable validity to transformer-based alternatives while offering substantial computational efficiency. These results demonstrate that autoregressive models provide an effective framework for KG generation, with practical applications in knowledge base completion and query answering.