๐ค AI Summary
This work addresses the challenge of on-device large language models, where naively concatenating user memories quickly exhausts limited context windows, while simple averaging for memory compression suffers from semantic conflicts that degrade personalized generation. To overcome this, the paper introducesโ for the first timeโa clustering-based memory compression mechanism tailored for on-device settings. It groups memories according to the similarity of their embeddings, fuses memories within each cluster, and then integrates the compressed representations into the prompt in a context-aware manner. This approach substantially reduces the number of memory tokens while effectively mitigating semantic interference. Under a fixed context budget, it consistently outperforms both direct concatenation and average-based compression baselines, achieving a favorable balance between computational efficiency and the quality of personalized generation.
๐ Abstract
Large language models (LLMs) often rely on user-specific memories distilled from past interactions to enable personalized generation. A common practice is to concatenate these memories with the input prompt, but this approach quickly exhausts the limited context available in on-device LLMs. Compressing memories by averaging can mitigate context growth, yet it frequently harms performance due to semantic conflicts across heterogeneous memories. In this work, we introduce a clustering-based memory compression strategy that balances context efficiency and personalization quality. Our method groups memories by similarity and merges them within clusters prior to concatenation, thereby preserving coherence while reducing redundancy. Experiments demonstrate that our approach substantially lowers the number of memory tokens while outperforming baseline strategies such as naive averaging or direct concatenation. Furthermore, for a fixed context budget, clustering-driven merging yields more compact memory representations and consistently enhances generation quality.