π€ AI Summary
To address the limitations of external retrieval dependency, high computational overhead, and infrequent knowledge updates in large language model (LLM) knowledge enhancement, this paper proposes KBLaMβa retrieval-free, fine-tuning-free knowledge injection framework. KBLaM encodes structured knowledge bases (>10K triples) into continuous key-value vector pairs and integrates them end-to-end into an 8B-parameter LLM via a customized rectangular attention mechanism, enabling dynamic, single-GPU knowledge loading and real-time updates. Its computational complexity scales linearly with knowledge base size and supports interpretable tracking of knowledge usage. Experiments demonstrate that KBLaM significantly outperforms retrieval-augmented generation (RAG) and in-context learning (ICL) baselines on question answering and open-ended reasoning tasks. Deployed on a single A100 GPU, it achieves low latency and high interpretability, establishing the first efficient, scalable, plug-and-play paradigm for internalizing structured knowledge into LLMs.
π Abstract
In this paper, we propose Knowledge Base augmented Language Model (KBLaM), a new method for augmenting Large Language Models (LLMs) with external knowledge. KBLaM works with a knowledge base (KB) constructed from a corpus of documents, transforming each piece of knowledge in the KB into continuous key-value vector pairs via pre-trained sentence encoders with linear adapters and integrating them into pre-trained LLMs via a specialized rectangular attention mechanism. Unlike Retrieval-Augmented Generation, KBLaM eliminates external retrieval modules, and unlike in-context learning, its computational overhead scales linearly with KB size rather than quadratically. Our approach enables integrating a large KB of more than 10K triples into an 8B pre-trained LLM of only 8K context window on one single A100 80GB GPU and allows for dynamic updates without model fine-tuning or retraining. Experiments demonstrate KBLaM's effectiveness in various tasks, including question-answering and open-ended reasoning, while providing interpretable insights into its use of the augmented knowledge. Code and datasets are available at https://github.com/microsoft/KBLaM/