Dialogue Response Prefetching Based on Semantic Similarity and Prediction Confidence of Language Model

📅 2025-08-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high user-perceived latency (UPL) in spoken dialogue systems, this paper proposes a response prefetching mechanism jointly modeling semantic similarity and language model token-level confidence. The core contribution is the Prediction Confidence Model (PCM), which dynamically triggers response generation by real-time assessment of semantic similarity between partial speech stream predictions and ground-truth utterances, integrated with token-level confidence scores from the language model. This enables semantically reliable prediction and precomputation of responses *before* user utterance completion, avoiding speculative prefetching. Experiments demonstrate that PCM significantly improves prefetching accuracy, reduces redundant computation, lowers average UPL by 23.6%, and decreases response first-token latency by 19.4%, all while preserving ASR and NLU accuracy—thereby enhancing end-to-end interaction timeliness.

Technology Category

Application Category

📝 Abstract
Prefetching of dialogue responses has been investigated to reduce user-perceived latency (UPL), which refers to the user's waiting time before receiving the system's response, in spoken dialogue systems. To reduce the UPL, it is necessary to predict complete user utterances before the end of the user's speech, typically by language models, to prepare prefetched dialogue responses. In this study, we proposed a prediction confidence model (PCM) that determines whether prefetching is possible or not by estimating the semantic similarity between the predicted complete user utterance and the complete user utterance. We evaluated our PCM based on the differences between the predicted complete user utterance and the complete user utterance.
Problem

Research questions and friction points this paper is trying to address.

Reduce user-perceived latency in dialogue systems
Predict complete user utterances before speech ends
Estimate semantic similarity for prefetching decisions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prefetching dialogue responses using semantic similarity
Employing prediction confidence model for utterance accuracy
Reducing latency with predicted complete user utterances
🔎 Similar Papers
No similar papers found.
K
Kiyotada Mori
Nara Institute of Science and Technology, Japan; Guardian Robot Project, RIKEN, Japan
S
Seiya Kawano
Guardian Robot Project, RIKEN, Japan; Nara Institute of Science and Technology, Japan
A
Angel Fernando Garcia Contreras
Guardian Robot Project, RIKEN, Japan
Koichiro Yoshino
Koichiro Yoshino
Tokyo Institute of Technology / GRP, RIKEN
spoken dialogue systemsnatural language processingspoken language processinghuman robot