🤖 AI Summary
Existing knowledge graph distillation methods struggle to effectively model dynamic relations in temporal knowledge graphs, leading to suboptimal performance. This work proposes a large language model (LLM)-assisted temporal knowledge distillation framework that, for the first time, integrates an LLM as an auxiliary teacher alongside a high-capacity temporal teacher model. By leveraging background knowledge and temporal signals from the LLM and employing a staged alignment strategy to fuse dual guidance signals, the framework significantly enhances the temporal reasoning capability of lightweight student models without increasing inference overhead. Experimental results demonstrate that the proposed method consistently outperforms existing distillation approaches across multiple public temporal knowledge graph benchmarks, achieving superior link prediction performance while maintaining model compactness and efficiency.
📝 Abstract
Temporal knowledge graphs (TKGs) support reasoning over time-evolving facts, yet state-of-the-art models are often computationally heavy and costly to deploy. Existing compression and distillation techniques are largely designed for static graphs; directly applying them to temporal settings may overlook time-dependent interactions and lead to performance degradation. We propose an LLM-assisted distillation framework specifically designed for temporal knowledge graph reasoning. Beyond a conventional high-capacity temporal teacher, we incorporate a large language model as an auxiliary instructor to provide enriched supervision. The LLM supplies broad background knowledge and temporally informed signals, enabling a lightweight student to better model event dynamics without increasing inference-time complexity. Training is conducted by jointly optimizing supervised and distillation objectives, using a staged alignment strategy to progressively integrate guidance from both teachers. Extensive experiments on multiple public TKG benchmarks with diverse backbone architectures demonstrate that the proposed approach consistently improves link prediction performance over strong distillation baselines, while maintaining a compact and efficient student model. The results highlight the potential of large language models as effective teachers for transferring temporal reasoning capability to resource-efficient TKG systems.