π€ AI Summary
To address the high computational overhead, shallow knowledge integration, strong reliance on long contexts, and degraded inference performance caused by injecting external knowledge solely through input context in retrieval-augmented generation (RAG), this paper proposes ParamRAGβa novel parameterized RAG paradigm. ParamRAG is the first to map retrieved documents into learnable parameters and directly inject them into the feed-forward network (FFN) layers of large language models (LLMs), enabling deep coupling between external knowledge and model parameters. The method integrates document parameterization, FFN-layer adaptation, parameter-efficient fine-tuning, and joint retrieval-generation optimization. Experiments demonstrate significant improvements in accuracy and inference efficiency on knowledge-intensive tasks, reduced dependence on long-context windows, and complementary gains when combined with conventional RAG. All code, datasets, and models are publicly released.
π Abstract
Retrieval-augmented generation (RAG) techniques have emerged as a promising solution to enhance the reliability of large language models (LLMs) by addressing issues like hallucinations, outdated knowledge, and domain adaptation. In particular, existing RAG methods append relevant documents retrieved from external corpus or databases to the input of LLMs to guide their generation process, which we refer to as the in-context knowledge injection method. While this approach is simple and often effective, it has inherent limitations. Firstly, increasing the context length and number of relevant documents can lead to higher computational overhead and degraded performance, especially in complex reasoning tasks. More importantly, in-context knowledge injection operates primarily at the input level, but LLMs store their internal knowledge in their parameters. This gap fundamentally limits the capacity of in-context methods. To this end, we introduce Parametric retrieval-augmented generation (Parametric RAG), a new RAG paradigm that integrates external knowledge directly into the parameters of feed-forward networks (FFN) of an LLM through document parameterization. This approach not only saves online computational costs by eliminating the need to inject multiple documents into the LLMs' input context, but also deepens the integration of external knowledge into the parametric knowledge space of the LLM. Experimental results demonstrate that Parametric RAG substantially enhances both the effectiveness and efficiency of knowledge augmentation in LLMs. Also, it can be combined with in-context RAG methods to achieve even better performance. We have open-sourced all the code, data, and models in the following anonymized GitHub link: https://github.com/oneal2000/PRAG