π€ AI Summary
This work proposes FLAIR, a novel approach that addresses the limitation of conventional dialogue systems in achieving true full-duplex interaction due to their inability to perform cognitive reasoning concurrently with user speech input. Inspired by humansβ capacity for simultaneous listening and thinking, FLAIR models this concurrent reasoning mechanism as a trainable implicit inference process. It enables causally consistent, zero-latency continuous reasoning through recursive propagation of hidden states, without requiring explicit reasoning annotations. By integrating an ELBO-based objective, teacher-forcing fine-tuning, and a full-duplex spoken language understanding architecture, FLAIR achieves state-of-the-art performance across multiple spoken dialogue benchmarks, significantly enhancing robustness to dynamic conversational contexts and advancing full-duplex interaction capabilities.
π Abstract
During conversational interactions, humans subconsciously engage in concurrent thinking while listening to a speaker. Although this internal cognitive processing may not always manifest as explicit linguistic structures, it is instrumental in formulating high-quality responses. Inspired by this cognitive phenomenon, we propose a novel Full-duplex LAtent and Internal Reasoning method named FLAIR that conducts latent thinking simultaneously with speech perception. Unlike conventional "thinking" mechanisms in NLP, which require post-hoc generation, our approach aligns seamlessly with spoken dialogue systems: during the user's speaking phase, it recursively feeds the latent embedding output from the previous step into the next step, enabling continuous reasoning that strictly adheres to causality without introducing additional latency. To enable this latent reasoning, we design an Evidence Lower Bound-based objective that supports efficient supervised finetuning via teacher forcing, circumventing the need for explicit reasoning annotations. Experiments demonstrate the effectiveness of this think-while-listening design, which achieves competitive results on a range of speech benchmarks. Furthermore, FLAIR robustly handles conversational dynamics and attains competitive performance on full-duplex interaction metrics.