TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that existing personalized large language models struggle to effectively capture dynamically evolving user preferences over long-horizon tasks, often facing a trade-off between memory quality and efficiency. To overcome this limitation, we propose a dual-path memory-augmented framework that integrates a dynamic memory evolution mechanism to refine the write process and employs a context-distillation-based self-learning strategy to enhance retrieval accuracy, thereby internalizing users’ long-term experiences. Implemented on the Qwen-3 model series, our approach achieves a Pareto improvement—delivering high-fidelity, robust personalized responses while substantially reducing token overhead. Experimental results demonstrate that our system significantly outperforms state-of-the-art baselines such as Mem0 and Memory-R1 on long-horizon evaluation benchmarks.
📝 Abstract
Personalized large language models (PLLMs) have garnered significant attention for their ability to align outputs with individual's needs and preferences. However, they still struggle with long-horizon tasks, such as tracking a user's extensive history of conversations or activities. Existing memory mechanisms often fail to capture evolving behaviors, and RAG paradigms are trapped by a quality-efficiency tradeoff. Meanwhile, parametric adaptation is bottlenecked by train-inference gap due to the scarcity of labeled data. To enhance the long-horizon capabilities of PLLMs, we introduce TSUBASA, a two-pronged approach designed to improve memory writing via dynamic memory evolution, and memory reading via self-learning with a context distillation objective to internalize user experiences. Extensive evaluations on long-horizon benchmarks using the Qwen-3 model family (4B to 32B) validate the effectiveness of TSUBASA, surpassing competitive memory-augmented systems that rely primarily on memory writing, such as Mem0 and Memory-R1. Our analyses further confirms that TSUBASA breaks the quality-efficiency barrier to achieve Pareto improvements, delivering robust, high-fidelity personalization with a reduced token budget.
Problem

Research questions and friction points this paper is trying to address.

personalized large language models
long-horizon personalization
memory evolution
quality-efficiency tradeoff
train-inference gap
Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic memory evolution
context distillation
self-learning
long-horizon personalization
memory-augmented LLM
🔎 Similar Papers
No similar papers found.