Mem-T: Densifying Rewards for Long-Horizon Memory Agents

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

246K/year

🤖 AI Summary

This work addresses the challenge of end-to-end optimization in long-horizon memory agents, which is hindered by sparse and delayed reward signals. The authors propose Mem-T, an agent that integrates a lightweight hierarchical memory database with a multi-round dynamic retrieval mechanism. To enable effective learning, they introduce MoT-GRPO, a tree-guided reinforcement learning framework that leverages memory operation trees for backpropagation and hindsight credit assignment, thereby transforming sparse terminal rewards into dense, step-level supervision signals. This approach enables, for the first time, joint dense training of both memory construction and retrieval policies. Experimental results demonstrate that Mem-T outperforms A-Mem and Mem0 by 14.92% in accuracy while reducing inference token consumption by approximately 24.45%, achieving state-of-the-art performance on the accuracy–efficiency Pareto frontier.

Technology Category

Application Category

📝 Abstract

Memory agents, which depart from predefined memory-processing pipelines by endogenously managing the processing, storage, and retrieval of memories, have garnered increasing attention for their autonomy and adaptability. However, existing training paradigms remain constrained: agents often traverse long-horizon sequences of memory operations before receiving sparse and delayed rewards, which hinders truly end-to-end optimization of memory management policies. To address this limitation, we introduce Mem-T, an autonomous memory agent that interfaces with a lightweight hierarchical memory database to perform dynamic updates and multi-turn retrieval over streaming inputs. To effectively train long-horizon memory management capabilities, we further propose MoT-GRPO, a tree-guided reinforcement learning framework that transforms sparse terminal feedback into dense, step-wise supervision via memory operation tree backpropagation and hindsight credit assignment, thereby enabling the joint optimization of memory construction and retrieval. Extensive experiments demonstrate that Mem-T is (1) high-performing, surpassing frameworks such as A-Mem and Mem0 by up to $14.92\%$, and (2) economical, operating on a favorable accuracy-efficiency Pareto frontier and reducing inference tokens per query by $\sim24.45\%$ relative to GAM without sacrificing performance.

Problem

Research questions and friction points this paper is trying to address.

memory agents

long-horizon

sparse rewards

reward densification

end-to-end optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory Agent

Long-Horizon Reinforcement Learning

Reward Densification