The Curious Case of Factuality Finetuning: Models' Internal Beliefs Can Improve Factuality

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Large language models (LLMs) frequently generate factual hallucinations in long-text generation. Existing fine-tuning approaches relying on high-quality gold data are hampered by prohibitive annotation costs and may exacerbate downstream hallucinations by introducing correct but unfamiliar knowledge to the model. To address this, we propose leveraging the LLM’s internal beliefs—i.e., its self-assessed factual plausibility of generated outputs—as a primary signal for filtering training data. Our method is the first to use model-internal factuality judgments as the core filtering criterion, integrating self-checking, external verification, and gold-data-based selection. Experiments across three domains demonstrate that fine-tuning exclusively on generations deemed factual by the model itself yields significantly higher factual consistency than fine-tuning on gold data alone or other filtering configurations. Moreover, our approach substantially improves cross-domain generalization, confirming the efficacy and robustness of internal belief–guided data curation.

Technology Category

Application Category

📝 Abstract

Language models are prone to hallucination - generating text that is factually incorrect. Finetuning models on high-quality factual information can potentially reduce hallucination, but concerns remain; obtaining factual gold data can be expensive and training on correct but unfamiliar data may potentially lead to even more downstream hallucination. What data should practitioners finetune on to mitigate hallucinations in language models? In this work, we study the relationship between the factuality of finetuning data and the prevalence of hallucinations in long-form generation tasks. Counterintuitively, we find that finetuning on factual gold data is not as helpful as finetuning on model-generated data that models believe to be factual. Next, we evaluate filtering strategies applied on both factual gold data and model-generated data, and find that finetuning on model-generated data that is filtered by models' own internal judgments often leads to better overall factuality compared to other configurations: training on gold data filtered by models' judgments, training on gold data alone, or training on model-generated data that is supported by gold data. These factuality improvements transfer across three domains we study, suggesting that a models' own beliefs can provide a powerful signal for factuality.

Problem

Research questions and friction points this paper is trying to address.

How to reduce hallucinations in language models

Effect of finetuning data factuality on hallucinations

Using models' internal beliefs to improve factuality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Finetuning on model-generated factual data

Using models' internal judgments for filtering

Improving factuality across multiple domains

🔎 Similar Papers

No similar papers found.