π€ AI Summary
To address the challenges of geolocation, temporal attribution, and cross-platform provenance in multimedia misinformation detection, this paper proposes a multi-agent collaborative automated verification framework. Centered on multimodal large language models (MLLMs), the framework integrates reverse image search, metadata parsing, fact-checking databases, and authoritative news source processing into a six-stage deep analytical pipeline, enabling context-aware cross-verification across spatial, temporal, provenance, and motivational dimensions. Its key innovation lies in the first decoupled yet coordinated integration of MLLM-driven deep-reasoning agents with domain-specific toolchains, supporting dynamic task dispatching and evidence-based closed-loop validation. Evaluated on multiple challenging benchmarks, the system achieves substantial improvements: +23.6% in geolocation accuracy, +18.4% in temporal inference accuracy, and +31.2% in cross-platform provenance success rateβthereby enabling explainable, trustworthy authenticity assessment in complex misinformation scenarios.
π Abstract
This paper presents our submission to the ACMMM25 - Grand Challenge on Multimedia Verification. We developed a multi-agent verification system that combines Multimodal Large Language Models (MLLMs) with specialized verification tools to detect multimedia misinformation. Our system operates through six stages: raw data processing, planning, information extraction, deep research, evidence collection, and report generation. The core Deep Researcher Agent employs four tools: reverse image search, metadata analysis, fact-checking databases, and verified news processing that extracts spatial, temporal, attribution, and motivational context. We demonstrate our approach on a challenge dataset sample involving complex multimedia content. Our system successfully verified content authenticity, extracted precise geolocation and timing information, and traced source attribution across multiple platforms, effectively addressing real-world multimedia verification scenarios.