$δ$-mem: Efficient Online Memory for Large Language Models

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

252K/year

🤖 AI Summary

This work addresses the challenge that large language models struggle to efficiently reuse historical information in long-horizon tasks, where extending context windows incurs high computational costs with diminishing returns. The authors propose δ-mem, a lightweight online memory mechanism that compresses past interactions into a fixed-size associative memory state matrix (merely 8×8) and integrates this memory with the model via low-rank attention updates to a frozen backbone, thereby tightly coupling memory and attention without requiring fine-tuning, architectural modifications, or explicit context expansion. Experimental results demonstrate that δ-mem preserves general capabilities while achieving on average 1.10× the performance of the frozen backbone and 1.15× that of the strongest baseline, with notable gains of 1.31× and 1.20× on memory-intensive benchmarks MemoryAgentBench and LoCoMo, respectively.

📝 Abstract

Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. We propose $δ$-mem, a lightweight memory mechanism that augments a frozen full-attention backbone with a compact online state of associative memory. $δ$-mem compresses past information into a fixed-size state matrix updated by delta-rule learning, and uses its readout to generate low-rank corrections to the backbone's attention computation during generation. With only an $8\times8$ online memory state, $δ$-mem improves the average score to $1.10\times$ that of the frozen backbone and $1.15\times$ that of the strongest non-$δ$-mem memory baseline. It achieves larger gains on memory-heavy benchmarks, reaching $1.31\times$ on MemoryAgentBench and $1.20\times$ on LoCoMo, while largely preserving general capabilities. These results show that effective memory can be realized through a compact online state directly coupled with attention computation, without full fine-tuning, backbone replacement, or explicit context extension.

Problem

Research questions and friction points this paper is trying to address.

large language models

memory

context utilization

online memory

long-term agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

delta-rule learning

online memory

low-rank correction