๐ค AI Summary
Most existing task-oriented dialogue (TOD) systems are designed for single-session interactions and thus struggle to model and leverage long-term, cross-session memory. To address this memory deficiency in multi-session TOD, we introduce MS-TODโthe first benchmark dataset explicitly supporting long-term memory modeling. We further propose a two-stage Memory-Active Policy (MAP), which jointly integrates memory-guided intent alignment planning and error-detection-driven active correction responses to enable effective cross-session memory retrieval, QA-unit refinement, and redundancy filtering. Experiments demonstrate that MAP significantly improves task success rate and turns efficiency on MS-TOD, while maintaining competitive performance on standard single-session benchmarks (e.g., MultiWOZ). This work is the first to systematically define and tackle the problem of long-term memory modeling in multi-session task-oriented dialogue.
๐ Abstract
Existing Task-Oriented Dialogue (TOD) systems primarily focus on single-session dialogues, limiting their effectiveness in long-term memory augmentation. To address this challenge, we introduce a MS-TOD dataset, the first multi-session TOD dataset designed to retain long-term memory across sessions, enabling fewer turns and more efficient task completion. This defines a new benchmark task for evaluating long-term memory in multi-session TOD. Based on this new dataset, we propose a Memory-Active Policy (MAP) that improves multi-session dialogue efficiency through a two-stage approach. 1) Memory-Guided Dialogue Planning retrieves intent-aligned history, identifies key QA units via a memory judger, refines them by removing redundant questions, and generates responses based on the reconstructed memory. 2) Proactive Response Strategy detects and correct errors or omissions, ensuring efficient and accurate task completion. We evaluate MAP on MS-TOD dataset, focusing on response quality and effectiveness of the proactive strategy. Experiments on MS-TOD demonstrate that MAP significantly improves task success and turn efficiency in multi-session scenarios, while maintaining competitive performance on conventional single-session tasks.