🤖 AI Summary
Addressing the challenge of verifying complex claims under multi-source heterogeneous evidence, this paper proposes the first claim verification framework based on multi-LLM agent debate. It establishes a tripartite collaborative mechanism comprising proponent and opponent debaters alongside a judge, enabling multi-round argumentation to generate interpretable reasoning chains and culminating in a holistic factual assessment by the judge module. To overcome the scarcity of annotated debate data, we innovatively introduce a zero-shot debate data synthesis method. Furthermore, we design a post-training strategy for the judge module to enhance its discriminative capability. Evaluated across diverse evidence quality scenarios, our approach substantially outperforms existing state-of-the-art methods, achieving absolute accuracy improvements of 3.2–5.7 percentage points on benchmarks including FEVER and FEVEROUS. The source code and synthesized dataset are publicly released.
📝 Abstract
Claim verification is critical for enhancing digital literacy. However, the state-of-the-art single-LLM methods struggle with complex claim verification that involves multi-faceted evidences. Inspired by real-world fact-checking practices, we propose DebateCV, the first claim verification framework that adopts a debate-driven methodology using multiple LLM agents. In our framework, two Debaters take opposing stances on a claim and engage in multi-round argumentation, while a Moderator evaluates the arguments and renders a verdict with justifications. To further improve the performance of the Moderator, we introduce a novel post-training strategy that leverages synthetic debate data generated by the zero-shot DebateCV, effectively addressing the scarcity of real-world debate-driven claim verification data. Experimental results show that our method outperforms existing claim verification methods under varying levels of evidence quality. Our code and dataset are publicly available at https://anonymous.4open.science/r/DebateCV-6781.