Improving Multi-turn Dialogue Consistency with Self-Recall Thinking

📅 2026-05-14

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses the challenge of modeling long-range dependencies in multi-turn dialogue systems, which often leads to degraded consistency and efficiency bottlenecks. The authors propose the Self-Recall Thinking framework, an end-to-end approach that incorporates an intrinsic self-recall mechanism to selectively retrieve and reason over critical historical utterances during inference, thereby constructing interpretable reasoning chains without relying on external modules. The framework integrates dependency construction, capability initialization, and a verifiable reward–based optimization strategy to dynamically generate recall tokens that guide contextual utilization. Experimental results demonstrate that the proposed method achieves a 4.7% improvement in F1 score and reduces end-to-end latency by 14.7% across multiple datasets, significantly outperforming state-of-the-art baselines while striking a superior balance between detail retention and reasoning efficiency.

📝 Abstract

Large language model (LLM) based multi-turn dialogue systems often struggle to track dependencies across non-adjacent turns, undermining both consistency and scalability. As conversations lengthen, essential information becomes sparse and is buried in irrelevant context, while processing the entire dialogue history incurs severe efficiency bottlenecks. Existing solutions either rely on high latency external memory or lose fine-grained details through iterative summarization. In this paper, we propose Self-Recall Thinking (SRT), a framework designed to address long-range contextual dependency and sparse informative signals in multi-turn dialogue. SRT identifies helpful historical turns and uses them to generate contextually appropriate responses, enabling the model to selectively recall and reason over context during inference. This process yields an endogenous reasoning process that integrates interpretable recall steps without external modules. SRT incorporates: (1) Dependency Construction: Generating and converting it into self-recall chains; (2)Capability Initialization: Training to enable reasoning chains with recall tokens capability; (3)Reasoning Improvement: Refining accuracy via verifiable rewards to optimize recall and reasoning for correct answers. Experiments on multiple datasets demonstrate that SRT improves F1 score by 4.7% and reduces end-to-end latency by 14.7% over prior methods, achieving a balance between reasoning latency and accuracy, and outperforming state-of-the-art baselines.

Problem

Research questions and friction points this paper is trying to address.

multi-turn dialogue

contextual consistency

long-range dependency

information sparsity

dialogue scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Recall Thinking

multi-turn dialogue

long-range dependency