Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

📅 2025-08-13

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

To address catastrophic forgetting in parameter-efficient fine-tuning, high inference latency in retrieval-augmented generation (RAG), and prohibitive costs of domain-specific pretraining for large language models (LLMs), this paper proposes Memory Decoder—a plug-and-play, parameter-free pretraining memory module that requires no modification to the base model. Its core innovation is the first design of a portable, lightweight Transformer decoder that emulates non-parametric retrieval behavior, coupled with a domain-adaptive training strategy to enable low-overhead, low-latency memory injection. The module is universally deployable across diverse LLM architectures (e.g., Qwen, Llama) and domains (e.g., biomedicine, finance, law). Extensive experiments demonstrate an average perplexity reduction of 6.17 points across multiple models and tasks, validating its efficiency, stability, and strong generalization capability.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have shown strong abilities in general language tasks, yet adapting them to specific domains remains a challenge. Current method like Domain Adaptive Pretraining (DAPT) requires costly full-parameter training and suffers from catastrophic forgetting. Meanwhile, Retrieval-Augmented Generation (RAG) introduces substantial inference latency due to expensive nearest-neighbor searches and longer context. This paper introduces Memory Decoder, a plug-and-play pretrained memory that enables efficient domain adaptation without changing the original model's parameters. Memory Decoder employs a small transformer decoder that learns to imitate the behavior of an external non-parametric retriever. Once trained, Memory Decoder can be seamlessly integrated with any pretrained language model that shares the same tokenizer, requiring no model-specific modifications. Experimental results demonstrate that Memory Decoder enables effective adaptation of various Qwen and Llama models to three distinct specialized domains: biomedicine, finance, and law, reducing perplexity by an average of 6.17 points. Overall, Memory Decoder introduces a novel paradigm centered on a specially pretrained memory component designed for domain-specific adaptation. This memory architecture can be integrated in a plug-and-play manner, consistently enhancing performance across multiple models within the target domain.

Problem

Research questions and friction points this paper is trying to address.

Efficient domain adaptation for LLMs without parameter changes

Reducing inference latency compared to retrieval-augmented methods

Preventing catastrophic forgetting in specialized domain adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Plug-and-play pretrained memory for domain adaptation

Small transformer decoder imitates external retriever

Seamless integration with shared-tokenizer language models

🔎 Similar Papers

No similar papers found.