🤖 AI Summary
This work addresses a key limitation in existing fact-checking methods, which treat evidence retrieval as a static and isolated step, thereby hindering cross-claim evidence reuse and compromising both efficiency and consistency. To overcome this, the authors propose MERMAID, a novel framework that introduces a persistent evidence memory mechanism for the first time. MERMAID tightly couples retrieval and reasoning within an iterative reasoning-action loop through multi-agent collaboration and leverages structured knowledge representations to enable dynamic acquisition and reuse of evidence. The approach significantly enhances verification efficiency, consistency, and accuracy, achieving state-of-the-art performance across three fact-checking benchmarks and two claim verification datasets while effectively reducing redundant search operations.
📝 Abstract
Assessing the veracity of online content has become increasingly critical. Large language models (LLMs) have recently enabled substantial progress in automated veracity assessment, including automated fact-checking and claim verification systems. Typical veracity assessment pipelines break down complex claims into sub-claims, retrieve external evidence, and then apply LLM reasoning to assess veracity. However, existing methods often treat evidence retrieval as a static, isolated step and do not effectively manage or reuse retrieved evidence across claims. In this work, we propose MERMAID, a memory-enhanced multi-agent veracity assessment framework that tightly couples the retrieval and reasoning processes. MERMAID integrates agent-driven search, structured knowledge representations, and a persistent memory module within a Reason-Action style iterative process, enabling dynamic evidence acquisition and cross-claim evidence reuse. By retaining retrieved evidence in an evidence memory, the framework reduces redundant searches and improves verification efficiency and consistency. We evaluate MERMAID on three fact-checking benchmarks and two claim-verification datasets using multiple LLMs, including GPT, LLaMA, and Qwen families. Experimental results show that MERMAID achieves state-of-the-art performance while improving the search efficiency, demonstrating the effectiveness of synergizing retrieval, reasoning, and memory for reliable veracity assessment.