Tokenization, Fusion and Decoupling: Bridging the Granularity Mismatch Between Large Language Models and Knowledge Graphs

📅 2026-02-26

📈 Citations: 0

✨ Influential: 0

career value

147K/year

🤖 AI Summary

This work addresses the inherent granularity mismatch between large language models, which operate on token sequences, and knowledge graphs, whose fundamental units are structured entities—hindering existing approaches from simultaneously preserving semantic richness and structural integrity. To bridge this gap, the authors propose the KGT framework, which introduces dedicated entity tokens, a relation-guided gating fusion mechanism, and a decoupled semantic-structural prediction head. This enables end-to-end, full-space knowledge graph completion without requiring retraining from scratch. Notably, KGT is the first method to unify the representation spaces of textual and graph-structured data. Extensive experiments demonstrate that it significantly outperforms state-of-the-art baselines across multiple benchmarks, achieving superior prediction accuracy and enhanced generalization capability.

Technology Category

Application Category

📝 Abstract

Leveraging Large Language Models (LLMs) for Knowledge Graph Completion (KGC) is promising but hindered by a fundamental granularity mismatch. LLMs operate on fragmented token sequences, whereas entities are the fundamental units in knowledge graphs (KGs) scenarios. Existing approaches typically constrain predictions to limited candidate sets or align entities with the LLM's vocabulary by pooling multiple tokens or decomposing entities into fixed-length token sequences, which fail to capture both the semantic meaning of the text and the structural integrity of the graph. To address this, we propose KGT, a novel framework that uses dedicated entity tokens to enable efficient, full-space prediction. Specifically, we first introduce specialized tokenization to construct feature representations at the level of dedicated entity tokens. We then fuse pre-trained structural and textual features into these unified embeddings via a relation-guided gating mechanism, avoiding training from scratch. Finally, we implement decoupled prediction by leveraging independent heads to separate and combine semantic and structural reasoning. Experimental results show that KGT consistently outperforms state-of-the-art methods across multiple benchmarks.

Problem

Research questions and friction points this paper is trying to address.

granularity mismatch

large language models

knowledge graph completion

entity representation

tokenization

Innovation

Methods, ideas, or system contributions that make the work stand out.

entity tokenization

feature fusion

decoupled prediction