ReaLM: Residual Quantization Bridging Knowledge Graph Embeddings and Large Language Models

📅 2025-10-10

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

Existing LLM-based knowledge graph completion (KGC) methods suffer from a semantic gap between structured KG embeddings—represented in continuous vector spaces—and the discrete token space of large language models (LLMs), leading to inefficient knowledge transfer. To address this, we propose RQ-KGC: a novel framework that (1) discretizes pre-trained KG embeddings into learnable, compact code sequences via residual vector quantization (RVQ); (2) enforces ontology-guided categorical constraints to preserve semantic consistency; and (3) enables end-to-end joint optimization of KG representations and LLMs. RQ-KGC establishes, for the first time, a differentiable and trainable alignment pathway bridging symbolic structural knowledge and contextual language modeling. Extensive experiments demonstrate significant improvements over state-of-the-art methods on FB15k-237 and WN18RR, validating the effectiveness of deep integration between structured semantics and linguistic reasoning.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have recently emerged as a powerful paradigm for Knowledge Graph Completion (KGC), offering strong reasoning and generalization capabilities beyond traditional embedding-based approaches. However, existing LLM-based methods often struggle to fully exploit structured semantic representations, as the continuous embedding space of pretrained KG models is fundamentally misaligned with the discrete token space of LLMs. This discrepancy hinders effective semantic transfer and limits their performance. To address this challenge, we propose ReaLM, a novel and effective framework that bridges the gap between KG embeddings and LLM tokenization through the mechanism of residual vector quantization. ReaLM discretizes pretrained KG embeddings into compact code sequences and integrates them as learnable tokens within the LLM vocabulary, enabling seamless fusion of symbolic and contextual knowledge. Furthermore, we incorporate ontology-guided class constraints to enforce semantic consistency, refining entity predictions based on class-level compatibility. Extensive experiments on two widely used benchmark datasets demonstrate that ReaLM achieves state-of-the-art performance, confirming its effectiveness in aligning structured knowledge with large-scale language models.

Problem

Research questions and friction points this paper is trying to address.

Bridging knowledge graph embeddings with large language models

Aligning continuous embedding space with discrete token space

Enhancing semantic transfer for knowledge graph completion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Residual quantization bridges KG embeddings and LLMs

Discretizes KG embeddings into learnable LLM tokens

Enforces semantic consistency via ontology-guided constraints

🔎 Similar Papers

LLM-Enhanced User-Item Interactions: Leveraging Edge Information for Optimized Recommendations