Breaking mBad! Supervised Fine-tuning for Cross-Lingual Detoxification

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address cross-lingual toxic transfer—the challenge wherein multilingual large language models (MLLMs) propagate toxicity across languages during global deployment—this paper pioneers and empirically validates the “cross-lingual detoxification” paradigm, enabling generalization of toxicity mitigation across high- and low-resource languages and diverse writing systems. Methodologically, we construct a multilingual toxicity-annotated dataset, design script-agnostic supervised fine-tuning strategies, and establish a cross-distribution evaluation framework covering 504 language–script–resource combinations. Key contributions include: (1) a formal definition and empirical validation of cross-lingual detoxification efficacy; (2) identification of the inherent trade-off between safety enhancement and knowledge retention; and (3) significant reduction in toxicity outputs across multilingual benchmarks while preserving stable performance on non-toxic downstream tasks. All code and datasets are publicly released to ensure reproducibility.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) become increasingly prevalent in global applications, ensuring that they are toxicity-free across diverse linguistic contexts remains a critical challenge. We explore"Cross-lingual Detoxification", a cross-lingual paradigm that mitigates toxicity, enabling detoxification capabilities to transfer between high and low-resource languages across different script families. We analyze cross-lingual detoxification's effectiveness through 504 extensive settings to evaluate toxicity reduction in cross-distribution settings with limited data and investigate how mitigation impacts model performance on non-toxic tasks, revealing trade-offs between safety and knowledge preservation. Our code and dataset are publicly available at https://github.com/himanshubeniwal/Breaking-mBad.

Problem

Research questions and friction points this paper is trying to address.

Mitigating toxicity in multilingual language models

Transferring detoxification between high and low-resource languages

Balancing safety and knowledge preservation in detoxification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised fine-tuning for cross-lingual detoxification

Toxicity mitigation across diverse linguistic contexts

Evaluating trade-offs between safety and knowledge preservation

🔎 Similar Papers

No similar papers found.