REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Existing LLM-based fact-checking systems heavily rely on external knowledge sources, resulting in high latency, low reliability, poor interpretability, and frequent hallucinations. This paper proposes RELFEX, a novel self-optimizing and interpretable fact-checking paradigm. Methodologically, RELFEX introduces contrastive activation–based guidance vectors—the first such approach—to disentangle the stylistic and substantive dimensions of “truthfulness” within the model’s latent layers, enabling accurate verdicts and high-quality explanations using only internal model knowledge. The framework integrates contrastive activation analysis, role-playing dialogue training, latent-layer explanation signal guidance, and self-refining few-shot learning. With only 465 self-refined samples, RELFEX achieves state-of-the-art performance. Cross-model transfer of explanation signals yields up to a 7.57% improvement, significantly enhancing reasoning faithfulness, inference efficiency, and real-time applicability.

Technology Category

Application Category

📝 Abstract

The prevalence of misinformation on social media threatens public trust, demanding automated fact-checking systems that provide accurate verdicts with interpretable explanations. However, existing large language model-based (LLM-based) approaches often rely heavily on external knowledge sources, introducing substantial latency and even hallucinations that undermine reliability, interpretability, and responsiveness, which is crucial for real-time use. To address these challenges, we propose REason-guided Fact-checking with Latent EXplanations REFLEX paradigm, a plug-and-play, self-refining paradigm that leverages the internal knowledge in backbone model to improve both verdict accuracy and explanation quality. REFLEX reformulates fact-checking as a role-play dialogue and jointly trains verdict prediction and explanation generation. It adaptively extracts contrastive activation pairs between the backbone model and its fine-tuned variant to construct steering vectors that disentangle truth into style and substance naturally. These activation-level signals guide inference and suppress noisy explanations, enabling more faithful and efficient reasoning. Experiments on real-world datasets show that REFLEX outperforms previous methods that steer toward a single truth direction and underscores the challenge traditional approaches face when handling the subtle, human-unknown truth in fact-checking tasks. Remarkably, with only 465 self-refined training samples, RELFEX achieves state-of-the-art performance. Furthermore, models trained with explanatory objectives can effectively guide those without them, yielding up to a 7.57% improvement, highlighting that internal explanation signals play a dual role in both interpreting and enhancing factual reasoning.

Problem

Research questions and friction points this paper is trying to address.

Automated fact-checking systems struggle with latency and hallucinations from external knowledge

Existing LLM approaches lack reliability and interpretability for real-time misinformation detection

Traditional methods cannot effectively handle subtle truth disentanglement in fact-checking tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulates fact-checking as role-play dialogue training

Extracts contrastive activation pairs to construct steering vectors

Disentangles truth into style and substance using activation signals

🔎 Similar Papers

No similar papers found.