๐ค AI Summary
This study investigates cross-lingual euphemism identification and challenges the assumption that semantic overlap alone suffices for effective transfer, particularly in low-resource language pairs such as TurkishโEnglish. By fine-grained categorization of potential euphemisms into semantically overlapping and non-overlapping types, the work integrates functional, pragmatic, and semantic dimensions to systematically analyze transfer efficacy and its asymmetry through cross-lingual semantic alignment and transfer learning. The findings reveal that training on non-overlapping instances can sometimes enhance performance, underscoring the critical influence of label distribution shifts and domain alignment on transfer outcomes. These results question the prevailing reliance on semantic overlap as the primary basis for cross-lingual transfer in euphemism detection.
๐ Abstract
Euphemisms substitute socially sensitive expressions, often softening or reframing meaning, and their reliance on cultural and pragmatic context complicates modeling across languages. In this study, we investigate how cross-lingual equivalence influences transfer in multilingual euphemism detection. We categorize Potentially Euphemistic Terms (PETs) in Turkish and English into Overlapping (OPETs) and Non-Overlapping (NOPETs) subsets based on their functional, pragmatic, and semantic alignment. Our findings reveal a transfer asymmetry: semantic overlap is insufficient to guarantee positive transfer, particularly in low-resource Turkish-to-English direction, where performance can degrade even for overlapping euphemisms, and in some cases, improve under NOPET-based training. Differences in label distribution help explain these counterintuitive results. Category-level analysis suggests that transfer may be influenced by domain-specific alignment, though evidence is limited by sparsity.