Mem-T: Densifying Rewards for Long-Horizon Memory Agents

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of end-to-end optimization in long-horizon memory agents, which is hindered by sparse and delayed reward signals. The authors propose Mem-T, an agent that integrates a lightweight hierarchical memory database with a multi-round dynamic retrieval mechanism. To enable effective learning, they introduce MoT-GRPO, a tree-guided reinforcement learning framework that leverages memory operation trees for backpropagation and hindsight credit assignment, thereby transforming sparse terminal rewards into dense, step-level supervision signals. This approach enables, for the first time, joint dense training of both memory construction and retrieval policies. Experimental results demonstrate that Mem-T outperforms A-Mem and Mem0 by 14.92% in accuracy while reducing inference token consumption by approximately 24.45%, achieving state-of-the-art performance on the accuracy–efficiency Pareto frontier.

Technology Category

Application Category

📝 Abstract
Memory agents, which depart from predefined memory-processing pipelines by endogenously managing the processing, storage, and retrieval of memories, have garnered increasing attention for their autonomy and adaptability. However, existing training paradigms remain constrained: agents often traverse long-horizon sequences of memory operations before receiving sparse and delayed rewards, which hinders truly end-to-end optimization of memory management policies. To address this limitation, we introduce Mem-T, an autonomous memory agent that interfaces with a lightweight hierarchical memory database to perform dynamic updates and multi-turn retrieval over streaming inputs. To effectively train long-horizon memory management capabilities, we further propose MoT-GRPO, a tree-guided reinforcement learning framework that transforms sparse terminal feedback into dense, step-wise supervision via memory operation tree backpropagation and hindsight credit assignment, thereby enabling the joint optimization of memory construction and retrieval. Extensive experiments demonstrate that Mem-T is (1) high-performing, surpassing frameworks such as A-Mem and Mem0 by up to $14.92\%$, and (2) economical, operating on a favorable accuracy-efficiency Pareto frontier and reducing inference tokens per query by $\sim24.45\%$ relative to GAM without sacrificing performance.
Problem

Research questions and friction points this paper is trying to address.

memory agents
long-horizon
sparse rewards
reward densification
end-to-end optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory Agent
Long-Horizon Reinforcement Learning
Reward Densification
Hierarchical Memory Database
Tree-Guided Policy Optimization
🔎 Similar Papers
No similar papers found.