MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory

📅 2024-04-17
🏛️ arXiv.org
📈 Citations: 6
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs) rely on implicit parametric memory, leading to catastrophic forgetting of rare facts, difficulty in knowledge updating, and factual hallucinations. To address these limitations, we propose an explicit external memory augmentation framework that enables end-to-end joint fine-tuning of LLMs with readable and writable structured memory—first of its kind. Our method introduces: (1) a differentiable memory controller for explicit memory architecture; (2) a memory-aware attention mechanism; and (3) a co-training strategy for memory read and write operations. This approach supports dynamic knowledge updates, interpretable retrieval, and controllable memory editing, significantly improving factual consistency and traceability. Empirical evaluation shows an average 8.2% accuracy gain on knowledge-intensive tasks and a 12% reduction in language modeling perplexity. Crucially, memory operations become observable and intervenable, enhancing transparency and controllability.

Technology Category

Application Category

📝 Abstract
While current large language models (LLMs) perform well on many knowledge-related tasks, they are limited by relying on their parameters as an implicit storage mechanism. As a result, they struggle with memorizing rare events and with updating their memory as facts change over time. In addition, the uninterpretable nature of parametric memory makes it challenging to prevent hallucination. Model editing and augmenting LLMs with parameters specialized for memory are only partial solutions. In this paper, we introduce MemLLM, a novel method of enhancing LLMs by integrating a structured and explicit read-and-write memory module. MemLLM tackles the aforementioned challenges by enabling dynamic interaction with the memory and improving the LLM's capabilities in using stored knowledge. Our experiments indicate that MemLLM enhances the LLM's performance and interpretability, in language modeling in general and knowledge-intensive tasks in particular. We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Rare Event Memory
Fact Updating
Innovation

Methods, ideas, or system contributions that make the work stand out.

MemLLM
Knowledge-intensive Tasks
Readable-writable Information Storage
🔎 Similar Papers
No similar papers found.