FAITH: Factuality Alignment through Integrating Trustworthiness and Honestness

📅 2026-04-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
Large language models (LLMs) often generate factually incorrect outputs despite possessing relevant knowledge, undermining their reliability. To address this, this work proposes FAITH, a post-training framework that achieves finer-grained factual alignment by integrating natural language uncertainty signals with external knowledge. FAITH is the first to model credibility (knowledge possession) and honesty (response behavior) as natural language state quadrants, combining semantic entropy, confidence scoring, and retrieval-augmented generation. It introduces a novel reward function that fuses correctness and uncertainty signals, which is optimized via proximal policy optimization (PPO). Experimental results demonstrate that FAITH significantly improves factual accuracy and truthfulness across four knowledge-intensive benchmarks.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) can generate factually inaccurate content even if they have corresponding knowledge, which critically undermines their reliability. Existing approaches attempt to mitigate this by incorporating uncertainty in QA prompt during training, but these numerical scores lack the semantic richness for LLM to properly understand its internal states of trustworthiness and honestness, leading to insufficient factuality alignment. We introduce FAITH (Factuality Alignment through Integrating Trustworthiness and Honestness), a post-training framework for factuality alignment that integrates natural-language uncertainty signals with external knowledge. Specifically, we augment training datasets by computing confidence scores and semantic entropy from LLM outputs and mapping them into a knowledge state quadrant that describes the model's internal knowledge possession (trustworthiness) and answering behaviors (honestness) in natural language. Based on this enhanced data, we design a reward function that considers both correctness and uncertainty signals, and fine-tune the LLM using the Proximal Policy Optimization (PPO) algorithm. To further mitigate weakly grounded responses, we design a retrieval-augmented module that retrieves relevant external passages, improving the consistency between internal and external knowledge representations. Extensive experiments on four knowledge-intensive benchmarks demonstrate that FAITH enhances the factual accuracy and truthfulness of LLMs.
Problem

Research questions and friction points this paper is trying to address.

factuality
trustworthiness
honestness
large language models
uncertainty
Innovation

Methods, ideas, or system contributions that make the work stand out.

factuality alignment
trustworthiness
honestness
retrieval-augmented generation
uncertainty quantification