MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

📅 2025-06-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of unbounded memory growth, high computational overhead, and poor generalization to long-tail context lengths in long-horizon, multi-turn language agents—caused by full-context prompting—this paper proposes MEM, an end-to-end reinforcement learning framework enabling efficient memory-reasoning co-processing under constant memory constraints. Its core contributions are: (1) a novel inference-driven memory integration mechanism that dynamically compresses and updates a compact, shared state for memory refinement; and (2) a scalable, composite multi-turn environment combining multiple datasets to support training on complex task sequences. Evaluated on a 16-target, multi-hop question-answering benchmark, MEM-7B achieves a 3.5× performance gain over Qwen2.5-14B-Instruct while reducing memory consumption by 3.7×, and demonstrates strong generalization beyond trained context lengths.

Technology Category

Application Category

📝 Abstract
Modern language agents must operate over long-horizon, multi-turn interactions, where they retrieve external information, adapt to observations, and answer interdependent queries. Yet, most LLM systems rely on full-context prompting, appending all past turns regardless of their relevance. This leads to unbounded memory growth, increased computational costs, and degraded reasoning performance on out-of-distribution input lengths. We introduce MEM1, an end-to-end reinforcement learning framework that enables agents to operate with constant memory across long multi-turn tasks. At each turn, MEM1 updates a compact shared internal state that jointly supports memory consolidation and reasoning. This state integrates prior memory with new observations from the environment while strategically discarding irrelevant or redundant information. To support training in more realistic and compositional settings, we propose a simple yet effective and scalable approach to constructing multi-turn environments by composing existing datasets into arbitrarily complex task sequences. Experiments across three domains, including internal retrieval QA, open-domain web QA, and multi-turn web shopping, show that MEM1-7B improves performance by 3.5x while reducing memory usage by 3.7x compared to Qwen2.5-14B-Instruct on a 16-objective multi-hop QA task, and generalizes beyond the training horizon. Our results demonstrate the promise of reasoning-driven memory consolidation as a scalable alternative to existing solutions for training long-horizon interactive agents, where both efficiency and performance are optimized.
Problem

Research questions and friction points this paper is trying to address.

Reducing memory growth in long-horizon LLM interactions
Improving reasoning performance with constant memory
Enhancing efficiency in multi-turn task environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning for constant memory agents
Compact shared state for memory and reasoning
Composing datasets for complex task sequences