Dialogue Response Prefetching Based on Semantic Similarity and Prediction Confidence of Language Model

📅 2025-08-06

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

To address high user-perceived latency (UPL) in spoken dialogue systems, this paper proposes a response prefetching mechanism jointly modeling semantic similarity and language model token-level confidence. The core contribution is the Prediction Confidence Model (PCM), which dynamically triggers response generation by real-time assessment of semantic similarity between partial speech stream predictions and ground-truth utterances, integrated with token-level confidence scores from the language model. This enables semantically reliable prediction and precomputation of responses *before* user utterance completion, avoiding speculative prefetching. Experiments demonstrate that PCM significantly improves prefetching accuracy, reduces redundant computation, lowers average UPL by 23.6%, and decreases response first-token latency by 19.4%, all while preserving ASR and NLU accuracy—thereby enhancing end-to-end interaction timeliness.

Technology Category

Application Category

📝 Abstract

Prefetching of dialogue responses has been investigated to reduce user-perceived latency (UPL), which refers to the user's waiting time before receiving the system's response, in spoken dialogue systems. To reduce the UPL, it is necessary to predict complete user utterances before the end of the user's speech, typically by language models, to prepare prefetched dialogue responses. In this study, we proposed a prediction confidence model (PCM) that determines whether prefetching is possible or not by estimating the semantic similarity between the predicted complete user utterance and the complete user utterance. We evaluated our PCM based on the differences between the predicted complete user utterance and the complete user utterance.

Problem

Research questions and friction points this paper is trying to address.

Reduce user-perceived latency in dialogue systems

Predict complete user utterances before speech ends

Estimate semantic similarity for prefetching decisions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prefetching dialogue responses using semantic similarity

Employing prediction confidence model for utterance accuracy

Reducing latency with predicted complete user utterances

🔎 Similar Papers

No similar papers found.