SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval

📅 2025-05-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the rapid spread of misinformation in multilingual (especially low-resource) social media, this paper introduces the first multilingual and cross-lingual claim retrieval task and evaluation framework for fact-checking. Methodologically, it integrates multilingual pretrained encoders, cross-lingual alignment of representations, retrieval-oriented fine-tuning, and zero-shot transfer techniques to enable unsupervised or weakly supervised cross-lingual matching. Its core contributions include: (1) the first systematically constructed low-resource-friendly multilingual fact-checking retrieval benchmark, advancing fact-checking from an English-centric paradigm toward global language coverage; (2) the organization of an international shared task, attracting 179 participants and 52 submissions; and (3) state-of-the-art performance achieving 68.3% Recall@10 in multilingual settings—substantially outperforming all baselines.

Technology Category

Application Category

📝 Abstract
The rapid spread of online disinformation presents a global challenge, and machine learning has been widely explored as a potential solution. However, multilingual settings and low-resource languages are often neglected in this field. To address this gap, we conducted a shared task on multilingual claim retrieval at SemEval 2025, aimed at identifying fact-checked claims that match newly encountered claims expressed in social media posts across different languages. The task includes two subtracks: (1) a monolingual track, where social posts and claims are in the same language, and (2) a crosslingual track, where social posts and claims might be in different languages. A total of 179 participants registered for the task contributing to 52 test submissions. 23 out of 31 teams have submitted their system papers. In this paper, we report the best-performing systems as well as the most common and the most effective approaches across both subtracks. This shared task, along with its dataset and participating systems, provides valuable insights into multilingual claim retrieval and automated fact-checking, supporting future research in this field.
Problem

Research questions and friction points this paper is trying to address.

Addressing multilingual disinformation through claim retrieval
Exploring low-resource languages in fact-checking systems
Comparing monolingual and crosslingual claim matching performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual claim retrieval shared task
Monolingual and crosslingual subtracks included
Best-performing systems and approaches analyzed
🔎 Similar Papers
No similar papers found.
Qiwei Peng
Qiwei Peng
University of Copenhagen
Multilingual NLPRepresentation Learning
Robert Moro
Robert Moro
Senior Researcher at Kempelen Institute of Intelligent Technologies
Artificial IntelligenceMachine LearningUser ModelingPersonalizationEye Tracking
Michal Gregor
Michal Gregor
Researcher; Kempelen Institute of Intelligent Technologies
Deep LearningReinforcement LearningLLMsReasoning
Ivan Srba
Ivan Srba
Kempelen Institute of Intelligent Technologies
AIMachine LearningNatural Language ProcessingSocial ComputingDisinformation
S
Simon Ostermann
German Research Institute for Artificial Intelligence (DFKI)
M
Marián Simko
Kempelen Institute of Intelligent Technologies
J
Juraj Podrouvzek
Kempelen Institute of Intelligent Technologies
M
Mat'uvs Mesarvc'ik
Kempelen Institute of Intelligent Technologies
J
Jaroslav Kopvcan
Kempelen Institute of Intelligent Technologies
A
Anders Sogaard
University of Copenhagen