KoRe: Compact Knowledge Representations for Large Language Models

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Large language models implicitly encode knowledge within their parameters, resulting in opaque representations that are difficult to edit and prone to hallucination. To address this, this work proposes KoRe, a method that leverages knowledge graph embeddings to encode one-hop subgraphs into compact, discrete knowledge tokens, which are then efficiently injected into large language models without requiring extensive retraining. By doing so, KoRe substantially reduces the token overhead of knowledge representation—by up to an order of magnitude—while simultaneously enhancing model editability and interpretability. The approach achieves competitive performance across three benchmark tasks, demonstrating its effectiveness in integrating structured external knowledge into language models in a lightweight and controllable manner.

📝 Abstract

Modern Large Language Models (LLMs) have shown impressive performances in user-facing tasks such as question answering, as well as consistent improvements in reasoning capabilities. Still, the way these models encode knowledge seems inherently flawed: by design, LLMs encode world-knowledge within their parameters. This way of representing knowledge is inherently opaque, difficult to debug and update, and prone to hallucinations. On the other hand, Knowledge Graphs can provide human-readable and easily editable world knowledge representations, and their application in knowledge-intensive tasks has consistently proven beneficial to downstream performance. Nonetheless, current integration techniques require extensive retraining or finetuning. To overcome this issue, we introduce KoRe, a methodology to encode 1-hop sub-graphs into compact discrete knowledge tokens and inject them into a LLM backbone. We test the proposed approach on three established benchmarks, and report competitive performances coupled with a significant reduction (up to 10x) in token usage. Our results show that compact discrete KG representations can efficiently and effectively be used to ground modern LLMs.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Knowledge Representation

Knowledge Graphs

Hallucination

Model Interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge Graph

Discrete Knowledge Tokens

Large Language Models