🤖 AI Summary
Large language models often perform long-chain reasoning without user feedback, making erroneous premises difficult to correct promptly—resulting in wasted computation time and reduced answer accuracy. To address this, we propose Interleaved Reasoning (IR), a framework that interleaves explicit, stepwise planning with progressive intermediate answer generation, thereby enhancing the observability and intervenability of the reasoning process. IR enables users to intervene early to correct reasoning direction, mitigating error propagation. This is the first work to integrate explicit, structured stepwise planning directly into the language model’s inference pipeline. Experiments on mathematical and programming benchmarks demonstrate that IR improves pass@1 by approximately 6% and reduces first-response latency by over 60%, significantly enhancing both interactive efficiency and final answer accuracy.
📝 Abstract
Reasoning models often spend a significant amount of time thinking before they generate a visible response. In the meantime, they do not give the user any hints as to whether their reasoning is on the right track, and do not give the user any recourse to stop and correct them if their reasoning is flawed. This creates a frustrating, but unfortunately common, experience: the user's time is wasted while the model reasons from a false premise that could have easily been corrected. In contrast, human speakers typically perform lightweight, incremental grounding acts to ensure that participants in the conversation are on the same page; here we ask if language models can learn to leverage a similar type of behavior? With this motivation, we propose interleaved reasoning (IR), in which the model alternates between thinking and surfacing intermediate responses, as an alternative to the standard"think-then-answer"approach. By providing useful information to the user earlier, IR reduces perceived latency, the time a user waits for an initial output, without compromising the quality of the final response. We further introduce a specialization of interleaved reasoning, Plantain (Plan-Thought-Answer Interleaving), where the first intermediate response is an explicit, step-by-step plan for executing the task. This plan-first strategy allows for user intervention and early feedback for subsequent reasoning steps. We demonstrate that Plantain yields an ~6% improvement in pass@1 across several challenging math reasoning and coding benchmarks, while reducing time-to-first-response by over 60% relative to think-then-answer baselines.