🤖 AI Summary
This work addresses the modality gap between continuous graph embeddings and discrete tokens that hinders large language models (LLMs) in knowledge graph completion. To bridge this gap, the authors propose GS-Quant, a novel framework that introduces hierarchical semantic structure and causal dependencies into the discrete entity encoding process. Specifically, a granularity-aware semantic enhancement module injects coarse-to-fine hierarchical knowledge, while a generative structural reconstruction module establishes causal dependencies among code sequences, yielding hierarchical discrete codes that jointly preserve semantic coherence and structural generativity. The resulting structured semantic descriptors are integrated into the LLM’s vocabulary, enabling natural-language-like graph reasoning. Experiments demonstrate that GS-Quant significantly outperforms existing text-based and embedding-based baselines on knowledge graph completion, effectively narrowing the representational divide between graph structures and language generation.
📝 Abstract
Large Language Models (LLMs) have shown immense potential in Knowledge Graph Completion (KGC), yet bridging the modality gap between continuous graph embeddings and discrete LLM tokens remains a critical challenge. While recent quantization-based approaches attempt to align these modalities, they typically treat quantization as flat numerical compression, resulting in semantically entangled codes that fail to mirror the hierarchical nature of human reasoning. In this paper, we propose GS-Quant, a novel framework that generates semantically coherent and structurally stratified discrete codes for KG entities. Unlike prior methods, GS-Quant is grounded in the insight that entity representations should follow a linguistic coarse-to-fine logic. We introduce a Granular Semantic Enhancement module that injects hierarchical knowledge into the codebook, ensuring that earlier codes capture global semantic categories while later codes refine specific attributes. Furthermore, a Generative Structural Reconstruction module imposes causal dependencies on the code sequence, transforming independent discrete units into structured semantic descriptors. By expanding the LLM vocabulary with these learned codes, we enable the model to reason over graph structures isomorphically to natural language generation. Experimental results demonstrate that GS-Quant significantly outperforms existing text-based and embedding-based baselines. Our code is publicly available at https://github.com/mikumifa/GS-Quant.