๐ค AI Summary
This work addresses the lack of a unified evaluation framework and efficient evidence retrieval mechanisms for automated verification of multimodal claims. To this end, it introduces the first benchmark task dedicated to multimodal fact-checking and organizes a shared task to advance system capabilities in evidence retrieval and veracity assessment. The initiative features an innovative AVerImaTeC scoring metric that jointly evaluates evidence relevance and judgment accuracy, allowing participants to leverage either external knowledge sources or a provided knowledge base. Participating systems integrated multimodal retrieval, evidence ranking, and joint visionโlanguage reasoning, with some incorporating web search and structured knowledge bases. In the evaluation phase, all six teams surpassed the baseline, with the winning team HUMANE achieving an AVerImaTeC score of 0.5455, demonstrating the effectiveness of the proposed approaches.
๐ Abstract
The Automatic Verification of Image-Text Claims (AVerImaTeC) shared task aims to advance system development for retrieving evidence and verifying real-world image-text claims. Participants were allowed to either employ external knowledge sources, such as web search engines, or leverage the curated knowledge store provided by the organizers. System performance was evaluated using the AVerImaTeC score, defined as a conditional verdict accuracy in which a verdict is considered correct only when the associated evidence score exceeds a predefined threshold. The shared task attracted 14 submissions during the development phase and 6 submissions during the testing phase. All participating systems in the testing phase outperformed the baseline provided. The winning team, HUMANE, achieved an AVerImaTeC score of 0.5455. This paper provides a detailed description of the shared task, presents the complete evaluation results, and discusses key insights and lessons learned.