Pause or Fabricate? Training Language Models for Grounded Reasoning

📅 2026-04-21

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses the tendency of large language models to hallucinate when faced with insufficient input information and their difficulty in recognizing missing premises for sound reasoning. To mitigate this, the authors propose GRIL, a novel framework that introduces, for the first time, a reasoning-boundary awareness mechanism. GRIL decomposes reasoning into two stages—“clarification and pausing” and “grounded reasoning”—via multi-turn reinforcement learning, employing stage-specific rewards to encourage the model to actively pause rather than generate unfounded answers. Experimental results on GSM8K-Insufficient and MetaMATH-Insufficient benchmarks demonstrate that GRIL improves premise-detection accuracy by 45%, increases task success rate by 30%, and reduces average response length by over 20%, exhibiting strong robustness and generalization capabilities.

Technology Category

Application Category

📝 Abstract

Large language models have achieved remarkable progress on complex reasoning tasks. However, they often implicitly fabricate information when inputs are incomplete, producing confident but unreliable conclusions -- a failure mode we term ungrounded reasoning. We argue that this issue arises not from insufficient reasoning capability, but from the lack of inferential boundary awareness -- the ability to recognize when the necessary premises for valid inference are missing. To address this issue, we propose Grounded Reasoning via Interactive Reinforcement Learning (GRIL), a multi-turn reinforcement learning framework for grounded reasoning under incomplete information. GRIL decomposes the reasoning process into two stages: clarify and pause, which identifies whether the available information is sufficient, and grounded reasoning, which performs task solving once the necessary premises are established. We design stage-specific rewards to penalize hallucinations, enabling models to detect gaps, stop proactively, and resume reasoning after clarification. Experiments on GSM8K-Insufficient and MetaMATH-Insufficient show that GRIL significantly improves premise detection (up to 45%), leading to a 30% increase in task success while reducing average response length by over 20%. Additional analyses confirm robustness to noisy user responses and generalization to out-of-distribution tasks.

Problem

Research questions and friction points this paper is trying to address.

ungrounded reasoning

inferential boundary awareness

hallucination

incomplete information

premise detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

grounded reasoning

interactive reinforcement learning

inferential boundary awareness