Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of current large language models, which often rely on internal, uninformed reasoning when faced with missing or ambiguous information and struggle to effectively clarify premises or user intent. To overcome this, we propose the Proactive Interactive Reasoning (PIR) paradigm, which uniquely integrates user interaction directly into the reasoning process, enabling the model to actively pose clarifying questions to resolve uncertainty. We develop an uncertainty-aware supervised fine-tuning approach combined with a policy optimization framework based on user simulators, aligned with user intent through a composite reward mechanism. Experimental results demonstrate that PIR significantly enhances performance across mathematical reasoning, code generation, and document editing tasks, achieving improvements of up to 32.70% in accuracy, 22.90% in pass rate, and 41.36% in BLEU score, while reducing both reasoning computation and redundant interaction rounds by nearly half.

Technology Category

Application Category

📝 Abstract
Reasoning-oriented Large Language Models (LLMs) have achieved remarkable progress with Chain-of-Thought (CoT) prompting, yet they remain fundamentally limited by a \emph{blind self-thinking} paradigm: performing extensive internal reasoning even when critical information is missing or ambiguous. We propose Proactive Interactive Reasoning (PIR), a new reasoning paradigm that transforms LLMs from passive solvers into proactive inquirers that interleave reasoning with clarification. Unlike existing search- or tool-based frameworks that primarily address knowledge uncertainty by querying external environments, PIR targets premise- and intent-level uncertainty through direct interaction with the user. PIR is implemented via two core components: (1) an uncertainty-aware supervised fine-tuning procedure that equips models with interactive reasoning capability, and (2) a user-simulator-based policy optimization framework driven by a composite reward that aligns model behavior with user intent. Extensive experiments on mathematical reasoning, code generation, and document editing demonstrate that PIR consistently outperforms strong baselines, achieving up to 32.70\% higher accuracy, 22.90\% higher pass rate, and 41.36 BLEU improvement, while reducing nearly half of the reasoning computation and unnecessary interaction turns. Further reliability evaluations on factual knowledge, question answering, and missing-premise scenarios confirm the strong generalization and robustness of PIR. Model and code are publicly available at: \href{https://github.com/SUAT-AIRI/Proactive-Interactive-R1}
Problem

Research questions and friction points this paper is trying to address.

reasoning
large language models
uncertainty
interactive reasoning
Chain-of-Thought
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proactive Interactive Reasoning
uncertainty-aware fine-tuning
user-simulator-based optimization
interactive clarification
reasoning efficiency