🤖 AI Summary
This work addresses the inefficiency of memory management in large language model agents during long-horizon tasks, which stems from sparse rewards and challenging credit assignment. To tackle this issue, the authors propose Fine-Mem, a framework that introduces chunk-level step rewards and an evidence-based reward attribution mechanism. By integrating reinforcement learning with auxiliary question-answering tasks and an evidence-anchored reward redistribution strategy, Fine-Mem enables fine-grained memory operations that are effectively aligned with global task objectives. Evaluated on the MemAlpha and MemoryAgentBench benchmarks, Fine-Mem significantly outperforms strong baselines, demonstrating consistent improvements in subtask success rates and generalization capabilities.
📝 Abstract
Effective memory management is essential for large language model agents to navigate long-horizon tasks. Recent research has explored using Reinforcement Learning to develop specialized memory manager agents. However, existing approaches rely on final task performance as the primary reward, which results in severe reward sparsity and ineffective credit assignment, providing insufficient guidance for individual memory operations. To this end, we propose Fine-Mem, a unified framework designed for fine-grained feedback alignment. First, we introduce a Chunk-level Step Reward to provide immediate step-level supervision via auxiliary chunk-specific question answering tasks. Second, we devise Evidence-Anchored Reward Attribution to redistribute global rewards by anchoring credit to key memory operations, based on the specific memory items utilized as evidence in reasoning. Together, these components enable stable policy optimization and align local memory operations with the long-term utility of memory. Experiments on Memalpha and MemoryAgentBench demonstrate that Fine-Mem consistently outperforms strong baselines, achieving superior success rates across various sub-tasks. Further analysis reveals its adaptability and strong generalization capabilities across diverse model configurations and backbones.