Pay-Per-Search Models are Abstention Models

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) frequently generate hallucinated responses due to their inability to reliably recognize knowledge boundaries. To address this, we propose MASH, a reinforcement learning–based framework that enables LLMs to autonomously decide—on a per-query basis—whether to invoke external search tools or abstain from answering, without requiring predefined knowledge boundaries. Its core innovation lies in unifying selective tool invocation and abstention into a single decision process, where search invocation serves as an implicit abstention signal; this jointly optimizes answer accuracy and tool usage cost. Evaluated on three knowledge-intensive question-answering benchmarks, MASH improves multi-hop QA accuracy by 7.6% over strong baselines. Moreover, its abstention discrimination performance matches that of dedicated abstention models, demonstrating both high effectiveness and out-of-the-box usability.

Technology Category

Application Category

📝 Abstract
LLMs cannot reliably recognize their parametric knowledge boundaries and often hallucinate answers to outside-of-boundary questions. In contrast, humans recognize their limitations and can either seek external help for such questions or abstain. In this paper, we introduce MASH (Modeling Abstention via Selective Help-seeking), a training framework that readily extracts abstentions from LLMs. Our key idea is that any external help-seeking by an LLM, i.e. search tool use, can serve as a proxy for abstention if the external help (search) is appropriately penalized while simultaneously rewarding answer accuracy. MASH operationalizes this idea using reinforcement learning with a pay-per-search reward. We run experiments on three knowledge-intensive QA datasets. Our results show that MASH substantially improves upon the selective help-seeking performance of prior efficient search approaches; on multi-hop datasets, MASH improves answer accuracy by 7.6%. Furthermore, MASH demonstrates strong off-the-shelf abstention -- it can distinguish between unanswerable/answerable questions and selectively generate responses for answerable questions -- showcasing behavior analogous to specialized abstention approaches. We emphasize that contrary to prior abstention methods, MASH does not require pre-determining knowledge boundaries to construct training data. Instead, MASH's abstentions are a by-product of training for the auxiliary selective help-seeking task. Overall, we show that MASH training effectively aligns search tool use with parametric knowledge, which can be successfully leveraged for making abstention decisions.
Problem

Research questions and friction points this paper is trying to address.

LLMs hallucinate answers beyond knowledge boundaries
MASH trains LLMs to abstain via selective help-seeking
Framework improves answer accuracy without predefined knowledge boundaries
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses reinforcement learning with pay-per-search reward
Trains LLMs to seek external help for abstention
Aligns search tool use with parametric knowledge boundaries