$δ$-mem: Efficient Online Memory for Large Language Models

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

267K/year
🤖 AI Summary
This work addresses the challenge that large language models struggle to efficiently reuse historical information in long-horizon tasks, where extending context windows incurs high computational costs with diminishing returns. The authors propose δ-mem, a lightweight online memory mechanism that compresses past interactions into a fixed-size associative memory state matrix (merely 8×8) and integrates this memory with the model via low-rank attention updates to a frozen backbone, thereby tightly coupling memory and attention without requiring fine-tuning, architectural modifications, or explicit context expansion. Experimental results demonstrate that δ-mem preserves general capabilities while achieving on average 1.10× the performance of the frozen backbone and 1.15× that of the strongest baseline, with notable gains of 1.31× and 1.20× on memory-intensive benchmarks MemoryAgentBench and LoCoMo, respectively.
📝 Abstract
Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. We propose $δ$-mem, a lightweight memory mechanism that augments a frozen full-attention backbone with a compact online state of associative memory. $δ$-mem compresses past information into a fixed-size state matrix updated by delta-rule learning, and uses its readout to generate low-rank corrections to the backbone's attention computation during generation. With only an $8\times8$ online memory state, $δ$-mem improves the average score to $1.10\times$ that of the frozen backbone and $1.15\times$ that of the strongest non-$δ$-mem memory baseline. It achieves larger gains on memory-heavy benchmarks, reaching $1.31\times$ on MemoryAgentBench and $1.20\times$ on LoCoMo, while largely preserving general capabilities. These results show that effective memory can be realized through a compact online state directly coupled with attention computation, without full fine-tuning, backbone replacement, or explicit context extension.
Problem

Research questions and friction points this paper is trying to address.

large language models
memory
context utilization
online memory
long-term agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

delta-rule learning
online memory
low-rank correction
associative memory
frozen LLM