Abstain-R1: Calibrated Abstention and Post-Refusal Clarification via Verifiable RL

📅 2026-04-18

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Large language models are prone to hallucination when faced with insufficient information and often lack reliable mechanisms for abstaining or seeking clarification. This work proposes a clarification-aware reinforcement learning reward mechanism (RLVR), which, for the first time, employs verifiable rewards to jointly optimize two behaviors: generating accurate answers for answerable questions and explicitly abstaining while issuing semantically aligned clarification requests for unanswerable ones. Fine-tuned on a 3B-parameter model, the approach significantly outperforms baseline methods on the Abstain-Test, Abstain-QA, and SelfAware benchmarks. Notably, its abstention and clarification capabilities rival those of substantially larger systems such as DeepSeek-R1, achieving well-calibrated and interpretable uncertainty-aware responses.

Technology Category

Application Category

📝 Abstract

Reinforcement fine-tuning improves the reasoning ability of large language models, but it can also encourage them to answer unanswerable queries by guessing or hallucinating missing information. Existing abstention methods either train models to produce generic refusals or encourage follow-up clarifications without verifying whether those clarifications identify the key missing information. We study queries that are clear in meaning but cannot be reliably resolved from the given information, and argue that a reliable model should not only abstain, but also explain what is missing. We propose a clarification-aware RLVR reward that, while rewarding correct answers on answerable queries, jointly optimizes explicit abstention and semantically aligned post-refusal clarification on unanswerable queries. Using this reward, we train Abstain-R1, a 3B model that improves abstention and clarification on unanswerable queries while preserving strong performance on answerable ones. Experiments on Abstain-Test, Abstain-QA, and SelfAware show that Abstain-R1 substantially improves over its base model and achieves unanswerable-query behavior competitive with larger systems including DeepSeek-R1, suggesting that calibrated abstention and clarification can be learned through verifiable rewards rather than emerging from scale alone.

Problem

Research questions and friction points this paper is trying to address.

abstention

clarification

unanswerable queries

hallucination

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

calibrated abstention

post-refusal clarification

verifiable RL