Semantic Label Drift in Cross-Cultural Translation

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a semantic label drift problem in machine translation (MT) arising from cultural disparities: synthetic data generation from high- to low-resource languages frequently introduces label distortion due to cultural misalignment—particularly severe in culturally sensitive domains. Methodologically, the study establishes, for the first time, cultural distance as a primary driver of label drift and proposes a cross-cultural comparative evaluation framework, integrating large-scale MT experiments across statistical MT systems and large language models (LLMs), alongside domain-specific analysis of culturally sensitive content. Results demonstrate that although LLMs possess implicit cultural knowledge, they may exacerbate label drift; insufficient cultural alignment significantly degrades label fidelity, leading to downstream task errors and cultural inconsistencies. This work provides both theoretical grounding and empirical evidence for developing culturally adaptive MT systems and culturally aware synthetic data generation methods.

Technology Category

Application Category

📝 Abstract
Machine Translation (MT) is widely employed to address resource scarcity in low-resource languages by generating synthetic data from high-resource counterparts. While sentiment preservation in translation has long been studied, a critical but underexplored factor is the role of cultural alignment between source and target languages. In this paper, we hypothesize that semantic labels are drifted or altered during MT due to cultural divergence. Through a series of experiments across culturally sensitive and neutral domains, we establish three key findings: (1) MT systems, including modern Large Language Models (LLMs), induce label drift during translation, particularly in culturally sensitive domains; (2) unlike earlier statistical MT tools, LLMs encode cultural knowledge, and leveraging this knowledge can amplify label drift; and (3) cultural similarity or dissimilarity between source and target languages is a crucial determinant of label preservation. Our findings highlight that neglecting cultural factors in MT not only undermines label fidelity but also risks misinterpretation and cultural conflict in downstream applications.
Problem

Research questions and friction points this paper is trying to address.

Machine translation alters semantic labels due to cultural divergence
Cultural alignment between languages critically impacts label preservation
Neglecting cultural factors risks misinterpretation and cultural conflict
Innovation

Methods, ideas, or system contributions that make the work stand out.

MT systems induce label drift in translation
LLMs encode cultural knowledge amplifying drift
Cultural similarity determines label preservation outcomes
🔎 Similar Papers
No similar papers found.
Mohsinul Kabir
Mohsinul Kabir
PhD Candidate at the University of Manchester
NLPHCIAI
T
Tasnim Ahmed
School of Computing, Queen’s University, Ontario, Canada
M
Md Mezbaur Rahman
Computer Science, University of Illinois Chicago
P
Polydoros Giannouris
Department of Computer Science, National Center for Text Mining, The University of Manchester
Sophia Ananiadou
Sophia Ananiadou
Professor, Computer Science, Manchester University, National Centre for Text Mining
Natural Language ProcessingText MiningComputational LinguisticsArtificial Intelligence