🤖 AI Summary
This work addresses the challenge of balancing bounded memory storage with flexible recall in infinite-horizon streaming dialogues. To this end, we propose ProStream, a novel framework featuring an active hierarchical memory mechanism tailored for streaming conversations. ProStream employs multi-granularity memory distillation and adaptive spatiotemporal optimization to maintain a bounded knowledge state while enabling efficient, on-demand retrieval. We further introduce STEM-Bench, the first benchmark specifically designed for evaluating streaming memory systems, and demonstrate that ProStream consistently outperforms existing methods in both recall accuracy and inference efficiency. These results effectively alleviate the fundamental trade-off between memory fidelity and computational efficiency in long-duration dialogue systems.
📝 Abstract
Real-world dialogue usually unfolds as an infinite stream. It thus requires bounded-state memory mechanisms to operate within an infinite horizon. However, existing read-then-think memory is fundamentally misaligned with this setting, as it cannot support ad-hoc memory recall while streams unfold. To explore this challenge, we introduce \textbf{STEM-Bench}, the first benchmark for \textbf{ST}reaming \textbf{E}valuation of \textbf{M}emory. It comprises over 14K QA pairs in dialogue streams that assess perception fidelity, temporal reasoning, and global awareness under infinite-horizon constraints. The preliminary analysis on STEM-Bench indicates a critical \textit{fidelity-efficiency dilemma}: retrieval-based methods use fragment context, while full-context models incur unbounded latency. To resolve this, we propose \textbf{ProStream}, a proactive hierarchical memory framework for streaming dialogues. It enables ad-hoc memory recall on demand by reasoning over continuous streams with multi-granular distillation. Moreover, it employs Adaptive Spatiotemporal Optimization to dynamically optimize retention based on expected utility. It enables a bounded knowledge state for lower inference latency without sacrificing reasoning fidelity. Experiments show that ProStream outperforms baselines in both accuracy and efficiency.