🤖 AI Summary
Existing LLM-based fact-checking systems heavily rely on external knowledge sources, resulting in high latency, low reliability, poor interpretability, and frequent hallucinations. This paper proposes RELFEX, a novel self-optimizing and interpretable fact-checking paradigm. Methodologically, RELFEX introduces contrastive activation–based guidance vectors—the first such approach—to disentangle the stylistic and substantive dimensions of “truthfulness” within the model’s latent layers, enabling accurate verdicts and high-quality explanations using only internal model knowledge. The framework integrates contrastive activation analysis, role-playing dialogue training, latent-layer explanation signal guidance, and self-refining few-shot learning. With only 465 self-refined samples, RELFEX achieves state-of-the-art performance. Cross-model transfer of explanation signals yields up to a 7.57% improvement, significantly enhancing reasoning faithfulness, inference efficiency, and real-time applicability.
📝 Abstract
The prevalence of misinformation on social media threatens public trust, demanding automated fact-checking systems that provide accurate verdicts with interpretable explanations. However, existing large language model-based (LLM-based) approaches often rely heavily on external knowledge sources, introducing substantial latency and even hallucinations that undermine reliability, interpretability, and responsiveness, which is crucial for real-time use. To address these challenges, we propose REason-guided Fact-checking with Latent EXplanations REFLEX paradigm, a plug-and-play, self-refining paradigm that leverages the internal knowledge in backbone model to improve both verdict accuracy and explanation quality. REFLEX reformulates fact-checking as a role-play dialogue and jointly trains verdict prediction and explanation generation. It adaptively extracts contrastive activation pairs between the backbone model and its fine-tuned variant to construct steering vectors that disentangle truth into style and substance naturally. These activation-level signals guide inference and suppress noisy explanations, enabling more faithful and efficient reasoning. Experiments on real-world datasets show that REFLEX outperforms previous methods that steer toward a single truth direction and underscores the challenge traditional approaches face when handling the subtle, human-unknown truth in fact-checking tasks. Remarkably, with only 465 self-refined training samples, RELFEX achieves state-of-the-art performance. Furthermore, models trained with explanatory objectives can effectively guide those without them, yielding up to a 7.57% improvement, highlighting that internal explanation signals play a dual role in both interpreting and enhancing factual reasoning.