🤖 AI Summary
Long-context modeling faces a fundamental trade-off between unbounded context extension and linear computational complexity. This paper introduces MemAgent, an end-to-end framework for long-context modeling that employs segmented reading and overwrite-based memory updating. It pioneers the adaptation of the DAPO (Decoupled Advantage Policy Optimization) algorithm to long-text training, integrated within a reinforcement learning agent architecture that generates multiple independent contextual dialogues—enabling efficient memory management and ultra-long-context extrapolation. A model trained on 32K-context sequences robustly extrapolates to question-answering tasks requiring up to 3.5M tokens, with less than 5% performance degradation. On the 512K-token RULER benchmark, it achieves over 95% accuracy. These results demonstrate substantial improvements in modeling long-range dependencies and generalization efficiency for extremely long inputs.
📝 Abstract
Despite improvements by length extrapolation, efficient attention and memory modules, handling infinitely long documents with linear complexity without performance degradation during extrapolation remains the ultimate challenge in long-text processing. We directly optimize for long-text tasks in an end-to-end fashion and introduce a novel agent workflow, MemAgent, which reads text in segments and updates the memory using an overwrite strategy. We extend the DAPO algorithm to facilitate training via independent-context multi-conversation generation. MemAgent has demonstrated superb long-context capabilities, being able to extrapolate from an 8K context trained on 32K text to a 3.5M QA task with performance loss < 5% and achieves 95%+ in 512K RULER test.