Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In long-context question answering, critical evidence is often dispersed and prone to coverage gaps or loss under conventional unidirectional “read-then-remember” processing. To address this, we propose RetroMem, a retroactive memory mechanism enabling nonlinear reasoning and dynamic retrieval of historical memory states—thereby overcoming the limitations of traditional forward-only, single-pass processing. We further design a multi-level reinforcement learning reward (RLMLR) that jointly incorporates step-wise memory utilization feedback and final answer accuracy, substantially increasing supervision signal density. Integrated within a memory-augmented agent framework, RetroMem supports selective memory retrospection and fine-grained training. Experiments on long-document multi-hop QA benchmarks demonstrate that RetroMem significantly outperforms existing memory-based baselines, effectively mitigating information degradation while improving long-range reasoning capability and contextual understanding.

Technology Category

Application Category

📝 Abstract
Large language models face challenges in long-context question answering, where key evidence of a query may be dispersed across millions of tokens. Existing works equip large language models with a memory corpus that is dynamically updated during a single-pass document scan, also known as the "memorize while reading" methods. While this approach scales efficiently, it suffers from irreversible forward-only processing, information loss through overwriting, and sparse reinforcement learning signals. To tackle these challenges, we present ReMemR1, a memory-augmented agent with callback-enhanced memory that allows selective retrieval from the entire memory history and allows non-linear reasoning and revisiting of early evidence. To further strengthen training, we propose Reinforcement Learning with Multi-Level Rewards (RLMLR), which combines final-answer rewards with dense, step-level signals that guide effective memory use. Together, these contributions mitigate information degradation, improve supervision, and support multi-hop memory utilizing. Experiments on long-document QA show significant gains over existing memory-based approaches, which validates ReMemR1 as an effective solution for long-context reasoning agents.
Problem

Research questions and friction points this paper is trying to address.

Addresses irreversible memory processing in long-context question answering
Mitigates information loss from overwriting in memory-augmented agents
Enhances multi-hop reasoning with revisitable memory and dense rewards
Innovation

Methods, ideas, or system contributions that make the work stand out.

Revisitable memory enabling selective retrieval from history
Callback-enhanced memory supporting non-linear evidence revisiting
Multi-level reinforcement learning combining dense and final rewards
🔎 Similar Papers
No similar papers found.
Yaorui Shi
Yaorui Shi
University of Science and Technology of China
Large Language Model
Y
Yuxin Chen
National University of Singapore
S
Siyuan Wang
Shanghai Jiao Tong University
S
Sihang Li
University of Science and Technology of China
Hengxing Cai
Hengxing Cai
Sun Yat-sen University
LLMVLMVLNUAV
Q
Qi Gu
Meituan
X
Xiang Wang
University of Science and Technology of China
An Zhang
An Zhang
University of Science and Technology
Generative ModelsTrustworthy AIAgentic AIRecommender System