🤖 AI Summary
This work addresses the challenge of efficiently maintaining long-term, personalized multimodal memory for large language model agents on resource-constrained edge devices, where high storage overhead and difficulty preserving semantic consistency are critical bottlenecks. To tackle this, the authors propose ScrapMem, a novel framework that introduces a biologically inspired optical forgetting mechanism to progressively compress the resolution of historical multimodal memories. ScrapMem further organizes salient events into a structured Episodic Memory Graph (EM-Graph) to preserve semantic coherence under causal temporal ordering. Experimental results on ATM-Bench demonstrate that ScrapMem achieves a new state-of-the-art Joint@10 score of 51.0% while reducing memory usage by up to 93%, and significantly improves Recall@10 to 70.3%.
📝 Abstract
Long-term personalized memory for LLM agents is challenging on resource-limited edge devices due to high storage costs and multimodal complexity. To address this, we propose ScrapMem, a framework that integrates multimodal data into "Scrapbook Page." ScrapMem introduces Optical Forgetting, an optical compression mechanism that progressively reduces the resolution of older memories, lowering storage cost while suppressing low-value details. To maintain semantic consistency, we construct an Episodic Memory Graph (EM-Graph) that organizes key events into a causal-temporal structure. Extensive experiments on the multimodal ATM-Bench showcase that ScrapMem provides three main benefits: (1) strong performance, achieving a new state-of-the-art with a 51.0% Joint@10 score; (2) high storage efficiency, reducing memory usage by up to 93% via optical forgetting; and (3) improved recall, increasing Recall@10 to 70.3% through structured aggregation. ScrapMem offers an effective and storage-efficient solution for on-device long-term memory in multimodal LLM agents.