🤖 AI Summary
This work addresses the limitations of existing memory-augmented large language model (LLM)-based recommender systems, which employ flat memory structures that fail to distinguish between short-term interactions and long-term preferences and lack effective management of the full memory lifecycle. To overcome these issues, the authors formulate recommendation as a partially observable problem and propose a hierarchical belief-state memory architecture comprising event-level, preference-level, and profile-level layers. They further design an LLM-driven agent that adaptively orchestrates memory operations across a six-stage lifecycle. This approach enables autonomous scheduling of memory evolution within recommender systems for the first time, achieving state-of-the-art performance on four benchmark domains in InstructRec, with average gains of 26.4% in HR@1 and 10.3% in NDCG@10, and demonstrating further pronounced improvements in dynamic scenarios.
📝 Abstract
Memory-augmented LLM agents have advanced personalized recommendation, yet existing approaches universally adopt flat memory representations that conflate ephemeral signals with stable preferences, and none provides a complete lifecycle governing how memory should evolve. We propose MARS (Memory-Augmented Agentic Recommender System), a framework that treats recommendation as a partially observable problem and maintains a structured belief state that progressively abstracts noisy behavioral observations into a compact estimate of user preferences. MARS organizes this belief state into three tiers: event memory buffers raw signals, preference memory maintains fine-grained mutable chunks with explicit strength and evidence tracking, and profile memory distills all preferences into a coherent natural language narrative. A complete lifecycle of six operations -- extraction, reinforcement, weakening, consolidation, forgetting, and resynthesis -- is adaptively scheduled by an LLM-based planner rather than fixed-interval heuristics. Experiments on four InstructRec benchmark domains show that \ours achieves state-of-the-art performance with average improvements of 26.4% in HR@1 and 10.3% in NDCG@10 over the strongest baselines with further gains from agentic scheduling in evolving settings.