MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

📅 2025-07-02

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

Long-context modeling faces a fundamental trade-off between unbounded context extension and linear computational complexity. This paper introduces MemAgent, an end-to-end framework for long-context modeling that employs segmented reading and overwrite-based memory updating. It pioneers the adaptation of the DAPO (Decoupled Advantage Policy Optimization) algorithm to long-text training, integrated within a reinforcement learning agent architecture that generates multiple independent contextual dialogues—enabling efficient memory management and ultra-long-context extrapolation. A model trained on 32K-context sequences robustly extrapolates to question-answering tasks requiring up to 3.5M tokens, with less than 5% performance degradation. On the 512K-token RULER benchmark, it achieves over 95% accuracy. These results demonstrate substantial improvements in modeling long-range dependencies and generalization efficiency for extremely long inputs.

Technology Category

Application Category

📝 Abstract

Despite improvements by length extrapolation, efficient attention and memory modules, handling infinitely long documents with linear complexity without performance degradation during extrapolation remains the ultimate challenge in long-text processing. We directly optimize for long-text tasks in an end-to-end fashion and introduce a novel agent workflow, MemAgent, which reads text in segments and updates the memory using an overwrite strategy. We extend the DAPO algorithm to facilitate training via independent-context multi-conversation generation. MemAgent has demonstrated superb long-context capabilities, being able to extrapolate from an 8K context trained on 32K text to a 3.5M QA task with performance loss < 5% and achieves 95%+ in 512K RULER test.

Problem

Research questions and friction points this paper is trying to address.

Handling infinitely long documents with linear complexity

Preventing performance degradation during long-text extrapolation

Optimizing end-to-end long-text processing with memory management

Innovation

Methods, ideas, or system contributions that make the work stand out.

Segment-based text reading with memory updates

DAPO algorithm for multi-conversation training

Linear complexity for infinite document processing

🔎 Similar Papers

No similar papers found.