🤖 AI Summary
This work addresses the issue of “fidelity hallucinations” in large language models, where generated outputs often disregard or contradict the input context. The authors propose a lightweight decoding-time framework that enhances generation fidelity without requiring model retraining or architectural modifications. For the first time, they adapt the logit-shaping concept from watermarking techniques to improve contextual faithfulness. Their approach applies additive adjustments to token-level logits based on context support, leveraging source-position attention and semantic similarity to adaptively allocate bias. They further introduce a three-tier enhancement strategy—static, context-aware, and token-aware—to refine generation. Experimental results demonstrate significant improvements in fidelity metrics across multiple open-source large language models on summarization and question-answering tasks, with minimal inference overhead.
📝 Abstract
Large language models (LLMs) often produce content that contradicts or overlooks information provided in the input context, a phenomenon known as faithfulness hallucination. In this paper, we propose Context-Fidelity Boosting (CFB), a lightweight and general decoding-time framework that reduces such hallucinations by increasing the generation probability of source-supported tokens. Motivated by logit-shaping principles from watermarking techniques, CFB applies additive token-level logit adjustments based on a token's degree of support from the input context. Specifically, we develop three boosting strategies: static boosting, which applies a fixed bias to source-supported tokens; context-aware boosting, which scales this bias using the divergence between next-token distributions with and without context; and token-aware boosting, which further redistributes the adaptive bias according to local relevance estimated from source-position attention and source-scoped semantic similarity. CFB requires no retraining or architectural changes, making it compatible with a wide range of LLMs. Experiments on summarization and question answering tasks across multiple open-source LLMs show that CFB consistently improves faithfulness metrics with minimal generation overhead. Our implementation is fully open-sourced.