Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the limitation of current large language models, which often rely on internal, uninformed reasoning when faced with missing or ambiguous information and struggle to effectively clarify premises or user intent. To overcome this, we propose the Proactive Interactive Reasoning (PIR) paradigm, which uniquely integrates user interaction directly into the reasoning process, enabling the model to actively pose clarifying questions to resolve uncertainty. We develop an uncertainty-aware supervised fine-tuning approach combined with a policy optimization framework based on user simulators, aligned with user intent through a composite reward mechanism. Experimental results demonstrate that PIR significantly enhances performance across mathematical reasoning, code generation, and document editing tasks, achieving improvements of up to 32.70% in accuracy, 22.90% in pass rate, and 41.36% in BLEU score, while reducing both reasoning computation and redundant interaction rounds by nearly half.

Technology Category

Application Category

📝 Abstract

Reasoning-oriented Large Language Models (LLMs) have achieved remarkable progress with Chain-of-Thought (CoT) prompting, yet they remain fundamentally limited by a \emph{blind self-thinking} paradigm: performing extensive internal reasoning even when critical information is missing or ambiguous. We propose Proactive Interactive Reasoning (PIR), a new reasoning paradigm that transforms LLMs from passive solvers into proactive inquirers that interleave reasoning with clarification. Unlike existing search- or tool-based frameworks that primarily address knowledge uncertainty by querying external environments, PIR targets premise- and intent-level uncertainty through direct interaction with the user. PIR is implemented via two core components: (1) an uncertainty-aware supervised fine-tuning procedure that equips models with interactive reasoning capability, and (2) a user-simulator-based policy optimization framework driven by a composite reward that aligns model behavior with user intent. Extensive experiments on mathematical reasoning, code generation, and document editing demonstrate that PIR consistently outperforms strong baselines, achieving up to 32.70\% higher accuracy, 22.90\% higher pass rate, and 41.36 BLEU improvement, while reducing nearly half of the reasoning computation and unnecessary interaction turns. Further reliability evaluations on factual knowledge, question answering, and missing-premise scenarios confirm the strong generalization and robustness of PIR. Model and code are publicly available at: \href{https://github.com/SUAT-AIRI/Proactive-Interactive-R1}

Problem

Research questions and friction points this paper is trying to address.

reasoning

large language models

uncertainty

interactive reasoning

Chain-of-Thought

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proactive Interactive Reasoning

uncertainty-aware fine-tuning

user-simulator-based optimization