🤖 AI Summary
This study addresses cross-lingual retrieval of verified claims for multilingual fact-checking—particularly in low-resource languages and on global topics (e.g., pandemics, conflicts). Recognizing the fundamental distinction between cross-lingual and multilingual retrieval, we propose two innovations: (1) a sentence-similarity-based negative sampling strategy tailored for cross-lingual settings; and (2) an LLM-driven cross-lingual re-ranking method. Crucially, we argue that cross-lingual tasks require dedicated modeling—not mere adaptation of multilingual approaches. Evaluated on a comprehensive benchmark spanning 47 languages and 283 language pairs, our method demonstrates that LLM-based re-ranking substantially outperforms supervised fine-tuning, achieving state-of-the-art overall performance. The framework delivers a scalable, robust cross-lingual retrieval paradigm for fact-checking in resource-scarce languages, advancing both practical applicability and theoretical grounding in cross-lingual information access.
📝 Abstract
Retrieval of previously fact-checked claims is a well-established task, whose automation can assist professional fact-checkers in the initial steps of information verification. Previous works have mostly tackled the task monolingually, i.e., having both the input and the retrieved claims in the same language. However, especially for languages with a limited availability of fact-checks and in case of global narratives, such as pandemics, wars, or international politics, it is crucial to be able to retrieve claims across languages. In this work, we examine strategies to improve the multilingual and crosslingual performance, namely selection of negative examples (in the supervised) and re-ranking (in the unsupervised setting). We evaluate all approaches on a dataset containing posts and claims in 47 languages (283 language combinations). We observe that the best results are obtained by using LLM-based re-ranking, followed by fine-tuning with negative examples sampled using a sentence similarity-based strategy. Most importantly, we show that crosslinguality is a setup with its own unique characteristics compared to the multilingual setup.