The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning

πŸ“… 2026-03-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes FLAIR, a novel approach that addresses the limitation of conventional dialogue systems in achieving true full-duplex interaction due to their inability to perform cognitive reasoning concurrently with user speech input. Inspired by humans’ capacity for simultaneous listening and thinking, FLAIR models this concurrent reasoning mechanism as a trainable implicit inference process. It enables causally consistent, zero-latency continuous reasoning through recursive propagation of hidden states, without requiring explicit reasoning annotations. By integrating an ELBO-based objective, teacher-forcing fine-tuning, and a full-duplex spoken language understanding architecture, FLAIR achieves state-of-the-art performance across multiple spoken dialogue benchmarks, significantly enhancing robustness to dynamic conversational contexts and advancing full-duplex interaction capabilities.

Technology Category

Application Category

πŸ“ Abstract
During conversational interactions, humans subconsciously engage in concurrent thinking while listening to a speaker. Although this internal cognitive processing may not always manifest as explicit linguistic structures, it is instrumental in formulating high-quality responses. Inspired by this cognitive phenomenon, we propose a novel Full-duplex LAtent and Internal Reasoning method named FLAIR that conducts latent thinking simultaneously with speech perception. Unlike conventional "thinking" mechanisms in NLP, which require post-hoc generation, our approach aligns seamlessly with spoken dialogue systems: during the user's speaking phase, it recursively feeds the latent embedding output from the previous step into the next step, enabling continuous reasoning that strictly adheres to causality without introducing additional latency. To enable this latent reasoning, we design an Evidence Lower Bound-based objective that supports efficient supervised finetuning via teacher forcing, circumventing the need for explicit reasoning annotations. Experiments demonstrate the effectiveness of this think-while-listening design, which achieves competitive results on a range of speech benchmarks. Furthermore, FLAIR robustly handles conversational dynamics and attains competitive performance on full-duplex interaction metrics.
Problem

Research questions and friction points this paper is trying to address.

full-duplex dialogue
latent reasoning
internal cognition
spoken dialogue systems
concurrent thinking
Innovation

Methods, ideas, or system contributions that make the work stand out.

latent reasoning
full-duplex dialogue
concurrent thinking
causal inference
teacher forcing
πŸ”Ž Similar Papers
No similar papers found.