🤖 AI Summary
This work addresses the limitations of existing fact-checking approaches, which are often confined to unimodal text or lack interpretability, thereby struggling to verify claims requiring joint visual and textual evidence. The authors propose a two-tier multimodal graph architecture that enables fine-grained evidence retrieval through a bidirectional image-text reasoning mechanism. Multimodal information is fused at both token and evidence levels to support claim verification, while a dedicated fusion decoder generates natural language explanations—realizing, for the first time, an integrated pipeline for retrieval, verification, and explanation. Key contributions include a novel multi-granularity fusion strategy, the bidirectional reasoning mechanism, and AIChartClaim, the first multimodal claim dataset centered on scientific figures in the AI domain. Experiments demonstrate that the proposed method significantly outperforms current baselines on multimodal claim verification tasks.
📝 Abstract
Verifying the truthfulness of claims usually requires joint multi-modal reasoning over both textual and visual evidence, such as analyzing both textual caption and chart image for claim verification. In addition, to make the reasoning process transparent, a textual explanation is necessary to justify the verification result. However, most claim verification works mainly focus on the reasoning over textual evidence only or ignore the explainability, resulting in inaccurate and unconvincing verification. To address this problem, we propose a novel model that jointly achieves evidence retrieval, multi-modal claim verification, and explanation generation. For evidence retrieval, we construct a two-layer multi-modal graph for claims and evidence, where we design image-to-text and text-to-image reasoning for multi-modal retrieval. For claim verification, we propose token- and evidence-level fusion to integrate claim and evidence embeddings for multi-modal verification. For explanation generation, we introduce multi-modal Fusion-in-Decoder for explainability. Finally, since almost all the datasets are in general domain, we create a scientific dataset, AIChartClaim, in AI domain to complement claim verification community. Experiments show the strength of our model.