Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

πŸ“… 2025-08-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address catastrophic forgetting in parameter-efficient fine-tuning, high inference latency in retrieval-augmented generation (RAG), and prohibitive costs of domain-specific pretraining for large language models (LLMs), this paper proposes Memory Decoderβ€”a plug-and-play, parameter-free pretraining memory module that requires no modification to the base model. Its core innovation is the first design of a portable, lightweight Transformer decoder that emulates non-parametric retrieval behavior, coupled with a domain-adaptive training strategy to enable low-overhead, low-latency memory injection. The module is universally deployable across diverse LLM architectures (e.g., Qwen, Llama) and domains (e.g., biomedicine, finance, law). Extensive experiments demonstrate an average perplexity reduction of 6.17 points across multiple models and tasks, validating its efficiency, stability, and strong generalization capability.

Technology Category

Application Category

πŸ“ Abstract
Large Language Models (LLMs) have shown strong abilities in general language tasks, yet adapting them to specific domains remains a challenge. Current method like Domain Adaptive Pretraining (DAPT) requires costly full-parameter training and suffers from catastrophic forgetting. Meanwhile, Retrieval-Augmented Generation (RAG) introduces substantial inference latency due to expensive nearest-neighbor searches and longer context. This paper introduces Memory Decoder, a plug-and-play pretrained memory that enables efficient domain adaptation without changing the original model's parameters. Memory Decoder employs a small transformer decoder that learns to imitate the behavior of an external non-parametric retriever. Once trained, Memory Decoder can be seamlessly integrated with any pretrained language model that shares the same tokenizer, requiring no model-specific modifications. Experimental results demonstrate that Memory Decoder enables effective adaptation of various Qwen and Llama models to three distinct specialized domains: biomedicine, finance, and law, reducing perplexity by an average of 6.17 points. Overall, Memory Decoder introduces a novel paradigm centered on a specially pretrained memory component designed for domain-specific adaptation. This memory architecture can be integrated in a plug-and-play manner, consistently enhancing performance across multiple models within the target domain.
Problem

Research questions and friction points this paper is trying to address.

Efficient domain adaptation for LLMs without parameter changes
Reducing inference latency compared to retrieval-augmented methods
Preventing catastrophic forgetting in specialized domain adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Plug-and-play pretrained memory for domain adaptation
Small transformer decoder imitates external retriever
Seamless integration with shared-tokenizer language models
πŸ”Ž Similar Papers
No similar papers found.
Jiaqi Cao
Jiaqi Cao
Shanghai Jiao Tong University
Natural Language ProcessingLong-term Memory
J
Jiarui Wang
LUMIA Lab, Shanghai Jiao Tong University, Shanghai, China
Rubin Wei
Rubin Wei
Shanghai Jiao Tong University
LLMMemory-Augmented LLM
Qipeng Guo
Qipeng Guo
Fudan University
K
Kai Chen
Shanghai AI Laboratory, Shanghai, China
B
Bowen Zhou
Department of Electronic Engineering, Tsinghua University, Beijing, China
Z
Zhouhan Lin
LUMIA Lab, Shanghai Jiao Tong University, Shanghai, China; Shanghai AI Laboratory, Shanghai, China