When to Speak, When to Abstain: Contrastive Decoding with Abstention

📅 2024-12-17

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

To address the problem of large language models (LLMs) generating hallucinated or erroneous outputs when lacking requisite knowledge—thereby compromising robustness and reliability—this paper proposes a training-free, inference-time refusal mechanism. The method introduces four distinct knowledge-accessibility test environments; designs a dual-source, dynamic relevance assessment and adaptive selection framework that jointly leverages parametric and contextual knowledge via relevance scoring and gating; and incorporates contrastive decoding to enable coordinated generation-refusal decisions. Evaluated across multiple benchmarks, the approach achieves a balanced trade-off between high-accuracy generation and high-confidence refusal—without any training overhead. Results show a 32.7% improvement in refusal accuracy and a 58.4% reduction in erroneous generation rate, significantly enhancing model reliability and user trust.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) demonstrate exceptional performance across diverse tasks by leveraging pre-trained (i.e., parametric) and external (i.e., contextual) knowledge. While substantial efforts have been made to enhance the utilization of both forms of knowledge, situations in which models lack relevant information remain underexplored. To investigate this challenge, we first present a controlled testbed featuring four distinct knowledge access scenarios, including the aforementioned edge case, revealing that conventional LLM usage exhibits insufficient robustness in handling all instances. Addressing this limitation, we propose Contrastive Decoding with Abstention (CDA), a novel training-free decoding method that allows LLMs to generate responses when relevant knowledge is available and to abstain otherwise. CDA estimates the relevance of both knowledge sources for a given input, adaptively deciding which type of information to prioritize and which to exclude. Through extensive experiments, we demonstrate that CDA can effectively perform accurate generation and abstention simultaneously, enhancing reliability and preserving user trust.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM robustness in knowledge absence

Contrastive Decoding with Abstention method

Balancing response generation and abstention

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive Decoding with Abstention

Training-free decoding method

Adaptive knowledge prioritization

🔎 Similar Papers

Know Your Limits: A Survey of Abstention in Large Language Models