LM2: Large Memory Models

📅 2025-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Standard Transformers exhibit inherent limitations in multi-step reasoning, relational argumentation, and long-context integration. To address these challenges, this paper proposes LM2, a Large Memory Model that introduces a lightweight, non-intrusive, interactive auxiliary memory module within a decoder-only architecture. The module supports test-time adaptive memory updates and explicit memory modeling—without altering the base model’s structure or disrupting standard pretraining pipelines. It integrates cross-attention mechanisms, gated memory updates, and a context representation warehouse, ensuring interpretability and broad applicability. Experiments demonstrate that LM2 achieves substantial gains: +37.1% average accuracy over RMT and +86.3% over Llama-3.2 on BABILong; +5.0% on MMLU; and state-of-the-art performance on multi-hop reasoning, numerical reasoning, and hundred-thousand-token-context question answering tasks.

Technology Category

Application Category

📝 Abstract
This paper introduces the Large Memory Model (LM2), a decoder-only Transformer architecture enhanced with an auxiliary memory module that aims to address the limitations of standard Transformers in multi-step reasoning, relational argumentation, and synthesizing information distributed over long contexts. The proposed LM2 incorporates a memory module that acts as a contextual representation repository, interacting with input tokens via cross attention and updating through gating mechanisms. To preserve the Transformers general-purpose capabilities, LM2 maintains the original information flow while integrating a complementary memory pathway. Experimental results on the BABILong benchmark demonstrate that the LM2model outperforms both the memory-augmented RMT model by 37.1% and the baseline Llama-3.2 model by 86.3% on average across tasks. LM2 exhibits exceptional capabilities in multi-hop inference, numerical reasoning, and large-context question-answering. On the MMLU dataset, it achieves a 5.0% improvement over a pre-trained vanilla model, demonstrating that its memory module does not degrade performance on general tasks. Further, in our analysis, we explore the memory interpretability, effectiveness of memory modules, and test-time behavior. Our findings emphasize the importance of explicit memory in enhancing Transformer architectures.
Problem

Research questions and friction points this paper is trying to address.

Enhancing multi-step reasoning in Transformers
Improving relational argumentation with memory modules
Synthesizing information across long contexts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces auxiliary memory module
Enhances multi-step reasoning capabilities
Maintains Transformer information flow
🔎 Similar Papers
No similar papers found.
Jikun Kang
Jikun Kang
LMTS at Salesforce
Machine LeanringReinforcement Learning
W
Wenqi Wu
Convergence Labs Ltd.
Filippos Christianos
Filippos Christianos
University of Edinburgh
Alex J. Chan
Alex J. Chan
Director of Engineering, Salesforce
Machine LearningInverse Reinforcement LearningImitation Learning
F
Fraser Greenlee
Convergence Labs Ltd.
G
George Thomas
Convergence Labs Ltd.
M
Marvin Purtorab
Convergence Labs Ltd.
A
Andy Toulis
Convergence Labs Ltd.