๐ค AI Summary
This work addresses the high memory and computational overhead incurred by long-term memory storage in streaming video understanding. To this end, the authors propose StreamMeCo, a novel framework that uniquely integrates graph-structure-aware compression with a time-decay memory retrieval mechanism. By modeling memory as a graph, the method employs edge-independent extreme-value sampling and edge-aware weight pruning to effectively compress redundant nodes. Simultaneously, a time-decay mechanism preserves retrieval accuracy by prioritizing temporally relevant information. Evaluated on three benchmark datasets, StreamMeCo achieves a 1.87ร speedup in memory retrieval under a 70% memory graph compression rate while improving average accuracy by 1.0%.
๐ Abstract
Vision agent memory has shown remarkable effectiveness in streaming video understanding. However, storing such memory for videos incurs substantial memory overhead, leading to high costs in both storage and computation. To address this issue, we propose StreamMeCo, an efficient Stream Agent Memory Compression framework. Specifically, based on the connectivity of the memory graph, StreamMeCo introduces edge-free minmax sampling for the isolated nodes and an edge-aware weight pruning for connected nodes, evicting the redundant memory nodes while maintaining the accuracy. In addition, we introduce a time-decay memory retrieval mechanism to further eliminate the performance degradation caused by memory compression. Extensive experiments on three challenging benchmark datasets (M3-Bench-robot, M3-Bench-web and Video-MME-Long) demonstrate that under 70% memory graph compression, StreamMeCo achieves a 1.87* speedup in memory retrieval while delivering an average accuracy improvement of 1.0%. Our code is available at https://github.com/Celina-love-sweet/StreamMeCo.