🤖 AI Summary
This work addresses the limitations of current large language and reasoning models in specialized fact-checking, which stem from insufficient domain knowledge, weak contextual understanding, and ineffective integration of multi-turn expert feedback. The authors propose a human-AI collaborative fact-checking framework that introduces an innovative “shared scratchpad” interaction paradigm, replacing conventional dialog-based exchanges by directly translating expert feedback into edits on the model’s reasoning trajectory. Built upon a large reasoning model, the approach establishes a mapping mechanism from natural language feedback to precise trajectory modifications, enabling efficient and interpretable multi-round collaborative verification. Experimental results demonstrate that the system significantly outperforms existing autonomous and collaborative methods in both automatic and human evaluations, producing higher-quality, more comprehensible, and practically useful reasoning chains, with users expressing a clear preference over traditional multi-turn dialogue interfaces.
📝 Abstract
Professional fact-checkers rely on domain knowledge and deep contextual understanding to verify claims. Large language models (LLMs) and large reasoning models (LRMs) lack such grounding and primarily reason from available evidence alone, creating a mismatch between expert-led and fully automated claim verification. To mitigate this gap, we posit human-AI collaboration as a more promising path forward, where expert feedback, grounded in real-world knowledge and domain expertise, guides the model's reasoning. However, existing LRMs are hard to calibrate to natural language feedback, particularly in a multi-turn interaction setup. We propose Co-FactChecker, a framework for human-AI collaborative claim verification. We introduce a new interaction paradigm that treats the model's thinking trace as a shared scratchpad. Co-FactChecker translates expert feedback into trace-edits that introduce targeted modifications to the trace, sidestepping the shortcomings of dialogue-based interaction. We provide theoretical results showing that trace-editing offers advantages over multi-turn dialogue, and our automatic evaluations demonstrate that Co-FactChecker outperforms existing autonomous and human-AI collaboration approaches. Human evaluations further show that Co-FactChecker is preferred over multi-turn dialogue, producing higher quality reasoning and verdicts along with relatively easier to interpret and more useful thinking traces.