MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

📅 2025-07-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Long-context modeling faces a fundamental trade-off between unbounded context extension and linear computational complexity. This paper introduces MemAgent, an end-to-end framework for long-context modeling that employs segmented reading and overwrite-based memory updating. It pioneers the adaptation of the DAPO (Decoupled Advantage Policy Optimization) algorithm to long-text training, integrated within a reinforcement learning agent architecture that generates multiple independent contextual dialogues—enabling efficient memory management and ultra-long-context extrapolation. A model trained on 32K-context sequences robustly extrapolates to question-answering tasks requiring up to 3.5M tokens, with less than 5% performance degradation. On the 512K-token RULER benchmark, it achieves over 95% accuracy. These results demonstrate substantial improvements in modeling long-range dependencies and generalization efficiency for extremely long inputs.

Technology Category

Application Category

📝 Abstract
Despite improvements by length extrapolation, efficient attention and memory modules, handling infinitely long documents with linear complexity without performance degradation during extrapolation remains the ultimate challenge in long-text processing. We directly optimize for long-text tasks in an end-to-end fashion and introduce a novel agent workflow, MemAgent, which reads text in segments and updates the memory using an overwrite strategy. We extend the DAPO algorithm to facilitate training via independent-context multi-conversation generation. MemAgent has demonstrated superb long-context capabilities, being able to extrapolate from an 8K context trained on 32K text to a 3.5M QA task with performance loss < 5% and achieves 95%+ in 512K RULER test.
Problem

Research questions and friction points this paper is trying to address.

Handling infinitely long documents with linear complexity
Preventing performance degradation during long-text extrapolation
Optimizing end-to-end long-text processing with memory management
Innovation

Methods, ideas, or system contributions that make the work stand out.

Segment-based text reading with memory updates
DAPO algorithm for multi-conversation training
Linear complexity for infinite document processing
🔎 Similar Papers
No similar papers found.
H
Hongli Yu
ByteDance Seed; Institute for AI Industry Research (AIR), Tsinghua University; SIA-Lab of Tsinghua AIR and ByteDance Seed
T
Tinghong Chen
Institute for AI Industry Research (AIR), Tsinghua University; SIA-Lab of Tsinghua AIR and ByteDance Seed
J
Jiangtao Feng
Institute for AI Industry Research (AIR), Tsinghua University; SIA-Lab of Tsinghua AIR and ByteDance Seed
Jiangjie Chen
Jiangjie Chen
ByteDance Seed
NLPMachine ReasoningLarge Language ModelsAutonomous Agent
Weinan Dai
Weinan Dai
Tsinghua University
Artificial IntelligenceLarge Language ModelsReinforcement Learning
Qiying Yu
Qiying Yu
Tsinghua University
Multimodal LearningSelf-supervised LearningLarge Models
Y
Ya-Qin Zhang
Institute for AI Industry Research (AIR), Tsinghua University; SIA-Lab of Tsinghua AIR and ByteDance Seed
Wei-Ying Ma
Wei-Ying Ma
Tsinghua University
Generative AI and Large Language Models (LLMs) for Science
J
Jingjing Liu
Institute for AI Industry Research (AIR), Tsinghua University; SIA-Lab of Tsinghua AIR and ByteDance Seed
M
Mingxuan Wang
ByteDance Seed; SIA-Lab of Tsinghua AIR and ByteDance Seed
H
Hao Zhou
Institute for AI Industry Research (AIR), Tsinghua University; SIA-Lab of Tsinghua AIR and ByteDance Seed