Pay-Per-Search Models are Abstention Models

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Large language models (LLMs) frequently generate hallucinated responses due to their inability to reliably recognize knowledge boundaries. To address this, we propose MASH, a reinforcement learning–based framework that enables LLMs to autonomously decide—on a per-query basis—whether to invoke external search tools or abstain from answering, without requiring predefined knowledge boundaries. Its core innovation lies in unifying selective tool invocation and abstention into a single decision process, where search invocation serves as an implicit abstention signal; this jointly optimizes answer accuracy and tool usage cost. Evaluated on three knowledge-intensive question-answering benchmarks, MASH improves multi-hop QA accuracy by 7.6% over strong baselines. Moreover, its abstention discrimination performance matches that of dedicated abstention models, demonstrating both high effectiveness and out-of-the-box usability.

Technology Category

Application Category

📝 Abstract

LLMs cannot reliably recognize their parametric knowledge boundaries and often hallucinate answers to outside-of-boundary questions. In contrast, humans recognize their limitations and can either seek external help for such questions or abstain. In this paper, we introduce MASH (Modeling Abstention via Selective Help-seeking), a training framework that readily extracts abstentions from LLMs. Our key idea is that any external help-seeking by an LLM, i.e. search tool use, can serve as a proxy for abstention if the external help (search) is appropriately penalized while simultaneously rewarding answer accuracy. MASH operationalizes this idea using reinforcement learning with a pay-per-search reward. We run experiments on three knowledge-intensive QA datasets. Our results show that MASH substantially improves upon the selective help-seeking performance of prior efficient search approaches; on multi-hop datasets, MASH improves answer accuracy by 7.6%. Furthermore, MASH demonstrates strong off-the-shelf abstention -- it can distinguish between unanswerable/answerable questions and selectively generate responses for answerable questions -- showcasing behavior analogous to specialized abstention approaches. We emphasize that contrary to prior abstention methods, MASH does not require pre-determining knowledge boundaries to construct training data. Instead, MASH's abstentions are a by-product of training for the auxiliary selective help-seeking task. Overall, we show that MASH training effectively aligns search tool use with parametric knowledge, which can be successfully leveraged for making abstention decisions.

Problem

Research questions and friction points this paper is trying to address.

LLMs hallucinate answers beyond knowledge boundaries

MASH trains LLMs to abstain via selective help-seeking

Framework improves answer accuracy without predefined knowledge boundaries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses reinforcement learning with pay-per-search reward

Trains LLMs to seek external help for abstention

Aligns search tool use with parametric knowledge boundaries

🔎 Similar Papers

Know Your Limits: A Survey of Abstention in Large Language Models