Thinking-while-speaking: A Controlled, Interleaved Reasoning Method for Real-Time Speech Generation

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
🤖 AI Summary
This work addresses the significant challenge of simultaneously achieving deep reasoning and fluent expression in real-time spoken language generation. The authors propose InterRS, a novel approach that interleaves reasoning steps within natural speech pauses to enable “thinking while speaking” in a real-time setting. Key innovations include a pioneering data generation pipeline that produces thought-speech interleaved sequences with controllable length ratios, an interleaved supervised fine-tuning strategy, and a dual-reward reinforcement learning mechanism combining TA-Balance and Linguistic Quality metrics. Experimental results demonstrate that InterRS improves performance by 13% on mathematical and logical reasoning benchmarks while substantially enhancing the naturalness of synthesized speech and the coherence of spoken chain-of-thought reasoning.
📝 Abstract
The thinking-while-speaking paradigm aims to make AI communication more human. A key challenge is maintaining fluent speech while performing deep reasoning. Our method, InterRS, tackles this by inserting reasoning steps only during natural speech generation. This requires high-quality data where reasoning and speech are precisely aligned, and the length ratio are under controlled. We introduce a novel pipeline to generate such seamlessly interleaved audio data. To train our model, we combine interleaved SFT with refined data and reinforcement learning with two new rewards: a TA-Balance Reward to manage timing and thinking-answer ratio, and a Linguistic Quality Reward to refine expression. Experiments show our approach achieves 13% better performance on mathmatical and logic benchmarks while generating instant response like a spoken-language instruct model which outputs fast CoT response. Furthermore, our method generates more natural and fluent answers than prior methods.
Problem

Research questions and friction points this paper is trying to address.

thinking-while-speaking
real-time speech generation
deep reasoning
fluency
interleaved reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

thinking-while-speaking
interleaved reasoning
real-time speech generation
reinforcement learning with rewards
controlled data alignment
🔎 Similar Papers
No similar papers found.