🤖 AI Summary
To address the challenge of cross-modal misinformation detection exacerbated by out-of-context (OOC) image-text misalignment, this paper proposes a domain-agnostic multi-agent debate framework. The method integrates multimodal large language models (MLLMs) with dynamic external knowledge retrieval, enabling specialized agents to collaboratively debate image-text consistency—achieving zero-shot, interpretable visual misinformation detection. Key contributions include: (i) the first debate-driven multi-agent reasoning mechanism, replacing single-point classification with knowledge-augmented consensus decision-making; and (ii) explicit modeling of cross-modal semantic alignment and OOC reasoning. Evaluated on mainstream benchmarks, the framework achieves state-of-the-art performance. Ablation studies confirm that external knowledge retrieval significantly improves accuracy. User studies further demonstrate its effectiveness in enhancing both expert and public discrimination capability and trustworthiness.
📝 Abstract
One of the most challenging forms of misinformation involves the out-of-context (OOC) use of images paired with misleading text, creating false narratives. Existing AI-driven detection systems lack explainability and require expensive finetuning. We address these issues with LLM-Consensus, a multi-agent debate system for OOC misinformation detection. LLM-Consensus introduces a novel multi-agent debate framework where multimodal agents collaborate to assess contextual consistency and request external information to enhance cross-context reasoning and decision-making. Our framework enables explainable detection with state-of-the-art accuracy even without domain-specific fine-tuning. Extensive ablation studies confirm that external retrieval significantly improves detection accuracy, and user studies demonstrate that LLM-Consensus boosts performance for both experts and non-experts. These results position LLM-Consensus as a powerful tool for autonomous and citizen intelligence applications.