VILLAIN at AVerImaTeC: Verifying Image-Text Claims via Multi-Agent Collaboration

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of multimodal misinformation by proposing a prompt-based multi-agent collaborative framework for verifying the veracity of joint image-text claims. The approach employs a synergistic mechanism of modality-specific and cross-modal agents to conduct a staged fact-checking pipeline, including image-text evidence retrieval, fine-grained inconsistency detection, analytical report generation, and question-answer pair construction, thereby enabling an interpretable verification process. The system integrates vision-language model agents, a web-augmented knowledge base, and multi-stage prompt engineering. Evaluated on the AVerImaTeC shared task, the proposed method achieves state-of-the-art performance across all metrics, and the implementation code has been publicly released.

Technology Category

Application Category

📝 Abstract
This paper describes VILLAIN, a multimodal fact-checking system that verifies image-text claims through prompt-based multi-agent collaboration. For the AVerImaTeC shared task, VILLAIN employs vision-language model agents across multiple stages of fact-checking. Textual and visual evidence is retrieved from the knowledge store enriched through additional web collection. To identify key information and address inconsistencies among evidence items, modality-specific and cross-modal agents generate analysis reports. In the subsequent stage, question-answer pairs are produced based on these reports. Finally, the Verdict Prediction agent produces the verification outcome based on the image-text claim and the generated question-answer pairs. Our system ranked first on the leaderboard across all evaluation metrics. The source code is publicly available at https://github.com/ssu-humane/VILLAIN.
Problem

Research questions and friction points this paper is trying to address.

multimodal fact-checking
image-text claims
verification
multi-agent collaboration
vision-language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent collaboration
multimodal fact-checking
vision-language models
cross-modal reasoning
prompt-based verification
🔎 Similar Papers
No similar papers found.