Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

📅 2026-05-01

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the challenge that large language model (LLM) agents, constrained by limited context windows, struggle to effectively track users’ dynamic preferences over extended interactions. Existing memory systems often rely on static rules or sparse rewards, leading to unstable optimization. Inspired by human cognitive memory mechanisms, we propose MemCoE, a two-stage memory co-evolution framework: it first induces memory organization principles through contrastive feedback, then leverages these principles to design structured process rewards for multi-turn reinforcement learning–based memory evolution. Our approach uniquely integrates schema theory and the functional division between the prefrontal cortex and hippocampus into LLM memory modeling, enabling joint learning of memory organization and content updating. Experiments demonstrate that MemCoE significantly outperforms strong baselines across three personalized memory benchmarks, exhibiting exceptional robustness, transferability, and computational efficiency under varying conditions—including explicit and implicit preferences, diverse data scales, and noisy environments.

📝 Abstract

Large language model (LLM) agents require long-term user memory for consistent personalization, but limited context windows hinder tracking evolving preferences over long interactions. Existing memory systems mainly rely on static, hand-crafted update rules; although reinforcement learning (RL)-based agents learn memory updates, sparse outcome rewards provide weak supervision, resulting in unstable long-horizon optimization. Drawing on memory schema theory and the functional division between prefrontal regions and hippocampus regions, we introduce MemCoE, a cognition-inspired two-stage optimization framework that learns how memory should be organized and what information to update. In the first stage, we propose Memory Guideline Induction to optimize a global guideline via contrastive feedback interpreted as textual gradients; in the second stage, Guideline-Aligned Memory Policy Optimization uses the induced guideline to define structured process rewards and performs multi-turn RL to learn a guideline-following memory evolution policy. We evaluate on three personalization memory benchmarks, covering explicit/implicit preference and different sizes and noise, and observe consistent improvements over strong baselines with favorable robustness, transferability, and efficiency.

Problem

Research questions and friction points this paper is trying to address.

long-term memory

personalization

memory evolution

reinforcement learning

context window limitation

Innovation

Methods, ideas, or system contributions that make the work stand out.

cognition-inspired memory

two-stage optimization

memory guideline induction