AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of LLM knowledge augmentation on large-scale knowledge graphs (KGs)—such as billion-scale triplets—including high GPU memory consumption (>20 GB), reliance on external retrievers, long-context dependencies, and substantial inference latency, this paper proposes AtlasKV: the first parameterized, end-to-end KG integration method scalable to billion-triple KGs. Its core innovations are: (i) KG2KV and HiKVP techniques that implicitly encode triples into the LLM’s key-value cache, leveraging native attention mechanisms for joint knowledge storage and retrieval; (ii) sublinear time/space complexity, with GPU memory usage under 20 GB; and (iii) zero-shot adaptation to novel knowledge—requiring no fine-tuning, external retrieval modules, or lengthy contextual inputs. Evaluated on multiple knowledge-intensive tasks, AtlasKV matches or surpasses RAG in accuracy while significantly reducing inference latency and enabling dynamic knowledge expansion.

Technology Category

Application Category

📝 Abstract
Retrieval-augmented generation (RAG) has shown some success in augmenting large language models (LLMs) with external knowledge. However, as a non-parametric knowledge integration paradigm for LLMs, RAG methods heavily rely on external retrieval modules and the retrieved textual context prior. Especially for very large scale knowledge augmentation, they would introduce substantial inference latency due to expensive searches and much longer relevant context. In this paper, we propose a parametric knowledge integration method, called extbf{AtlasKV}, a scalable, effective, and general way to augment LLMs with billion-scale knowledge graphs (KGs) (e.g. 1B triples) using very little GPU memory cost (e.g. less than 20GB VRAM). In AtlasKV, we introduce KG2KV and HiKVP to integrate KG triples into LLMs at scale with sub-linear time and memory complexity. It maintains strong knowledge grounding and generalization performance using the LLMs' inherent attention mechanism, and requires no external retrievers, long context priors, or retraining when adapting to new knowledge.
Problem

Research questions and friction points this paper is trying to address.

Augmenting LLMs with billion-scale knowledge graphs efficiently
Reducing inference latency in large-scale knowledge integration
Eliminating external retrievers and long context dependencies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates billion-scale knowledge graphs into LLMs parametrically
Uses KG2KV and HiKVP for sub-linear memory complexity
Eliminates external retrievers and long context dependencies