Lost in Translation - Multilingual Misinformation and its Evolution

📅 2023-10-27

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This study investigates the cross-lingual propagation patterns and evolutionary mechanisms of misinformation in multilingual environments. Method: Leveraging over 250,000 fact-checking claims spanning 95 languages, we propose a rumor evolution graph modeling framework based on XLM-R multilingual sentence embeddings, integrating semantic clustering, connected component analysis, and shortest-path computation. Contribution/Results: We quantitatively establish that 33% of false claims propagate across languages, with semantic drift significantly amplified by linguistic translation; 11.7% exhibit cross-lingual persistence. The analysis reveals strong homophilous diffusion tendencies alongside measurable permeability across linguistic boundaries. We further demonstrate widespread redundant fact-checking across languages and quantify language homogeneity bias—highlighting the critical need for localized verification. These findings provide a data-driven foundation for optimizing global fact-checking collaboration and resource allocation.

📝 Abstract

Misinformation and disinformation are growing threats in the digital age, spreading rapidly across languages and borders. This paper investigates the prevalence and dynamics of multilingual misinformation through an analysis of over 250,000 unique fact-checks spanning 95 languages. First, we find that while the majority of misinformation claims are only fact-checked once, 11.7%, corresponding to more than 21,000 claims, are checked multiple times. Using fact-checks as a proxy for the spread of misinformation, we find 33% of repeated claims cross linguistic boundaries, suggesting that some misinformation permeates language barriers. However, spreading patterns exhibit strong homophily, with misinformation more likely to spread within the same language. To study the evolution of claims over time and mutations across languages, we represent fact-checks with multilingual sentence embeddings and cluster semantically similar claims. We analyze the connected components and shortest paths connecting different versions of a claim finding that claims gradually drift over time and undergo greater alteration when traversing languages. Overall, this novel investigation of multilingual misinformation provides key insights. It quantifies redundant fact-checking efforts, establishes that some claims diffuse across languages, measures linguistic homophily, and models the temporal and cross-lingual evolution of claims. The findings advocate for expanded information sharing between fact-checkers globally while underscoring the importance of localized verification.

Problem

Research questions and friction points this paper is trying to address.

Measuring multilingual misinformation prevalence across 95 languages using global fact-checks

Quantifying how misinformation spreads and evolves across linguistic boundaries

Analyzing temporal evolution and cross-lingual mutations of misinformation claims

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using multilingual sentence embeddings for claim representation

Building semantic similarity graphs to track cross-lingual misinformation

Modeling temporal evolution and language mutation patterns

🔎 Similar Papers

Learn and Unlearn in Multilingual LLMs