🤖 AI Summary
This work addresses the challenge that existing personalized large language models struggle to effectively capture dynamically evolving user preferences over long-horizon tasks, often facing a trade-off between memory quality and efficiency. To overcome this limitation, we propose a dual-path memory-augmented framework that integrates a dynamic memory evolution mechanism to refine the write process and employs a context-distillation-based self-learning strategy to enhance retrieval accuracy, thereby internalizing users’ long-term experiences. Implemented on the Qwen-3 model series, our approach achieves a Pareto improvement—delivering high-fidelity, robust personalized responses while substantially reducing token overhead. Experimental results demonstrate that our system significantly outperforms state-of-the-art baselines such as Mem0 and Memory-R1 on long-horizon evaluation benchmarks.
📝 Abstract
Personalized large language models (PLLMs) have garnered significant attention for their ability to align outputs with individual's needs and preferences. However, they still struggle with long-horizon tasks, such as tracking a user's extensive history of conversations or activities. Existing memory mechanisms often fail to capture evolving behaviors, and RAG paradigms are trapped by a quality-efficiency tradeoff. Meanwhile, parametric adaptation is bottlenecked by train-inference gap due to the scarcity of labeled data. To enhance the long-horizon capabilities of PLLMs, we introduce TSUBASA, a two-pronged approach designed to improve memory writing via dynamic memory evolution, and memory reading via self-learning with a context distillation objective to internalize user experiences. Extensive evaluations on long-horizon benchmarks using the Qwen-3 model family (4B to 32B) validate the effectiveness of TSUBASA, surpassing competitive memory-augmented systems that rely primarily on memory writing, such as Mem0 and Memory-R1. Our analyses further confirms that TSUBASA breaks the quality-efficiency barrier to achieve Pareto improvements, delivering robust, high-fidelity personalization with a reduced token budget.