Translate, then Detect: Leveraging Machine Translation for Cross-Lingual Toxicity Classification

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Multilingual toxicity detection faces significant challenges due to the scarcity of annotated data for low-resource languages. This paper systematically investigates machine translation (MT)-enabled cross-lingual detection, comparing the “translate-then-classify” paradigm using monolingual classifiers against direct zero-shot inference with multilingual large language models (MLLMs). We further propose an MT-specific fine-tuning strategy tailored to toxicity detection to reduce rejection rates. Experiments across 16 languages show that translate-then-classify outperforms out-of-distribution MLLMs on 81.3% of languages—and achieves statistically significant gains on 6 out of 7 low-resource languages. Our analysis identifies MT quality and underlying language resource availability as key performance determinants. Results validate that lightweight MT coupled with conventional classifiers offers both effectiveness and practicality in low-resource settings, establishing a scalable new paradigm for multilingual content safety under resource constraints.

Technology Category

Application Category

📝 Abstract

Multilingual toxicity detection remains a significant challenge due to the scarcity of training data and resources for many languages. While prior work has leveraged the translate-test paradigm to support cross-lingual transfer across a range of classification tasks, the utility of translation in supporting toxicity detection at scale remains unclear. In this work, we conduct a comprehensive comparison of translation-based and language-specific/multilingual classification pipelines. We find that translation-based pipelines consistently outperform out-of-distribution classifiers in 81.3% of cases (13 of 16 languages), with translation benefits strongly correlated with both the resource level of the target language and the quality of the machine translation (MT) system. Our analysis reveals that traditional classifiers outperform large language model (LLM) judges, with this advantage being particularly pronounced for low-resource languages, where translate-classify methods dominate translate-judge approaches in 6 out of 7 cases. We additionally show that MT-specific fine-tuning on LLMs yields lower refusal rates compared to standard instruction-tuned models, but it can negatively impact toxicity detection accuracy for low-resource languages. These findings offer actionable guidance for practitioners developing scalable multilingual content moderation systems.

Problem

Research questions and friction points this paper is trying to address.

Evaluating machine translation for cross-lingual toxicity classification

Comparing translation-based versus language-specific classification pipelines

Assessing performance across resource levels and machine translation quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using machine translation for cross-lingual toxicity detection

Translation-based pipelines outperform out-of-distribution classifiers

MT-specific fine-tuning reduces refusal rates in LLMs

🔎 Similar Papers

Cross-lingual Text Classification Transfer: The Case of Ukrainian