🤖 AI Summary
This work addresses the challenges of machine unlearning in multilingual large language models, particularly concerning cross-lingual knowledge transfer and data bias. Building upon the Aya-Expanse 8B model, the study introduces the first multilingual unlearning evaluation benchmark spanning ten languages across five major language families, encompassing both high- and low-resource settings, and extends factual and stereotypical unlearning tasks to this diverse linguistic scope. By integrating translation-based data augmentation with linguistic distance analysis, the authors systematically investigate both data-level and concept-level unlearning scenarios. Their experiments reveal that unlearning is more stable in high-resource languages, that syntactic similarity strongly predicts cross-lingual unlearning efficacy, and that asymmetric unlearning transfer occurs between typologically related languages.
📝 Abstract
As multilingual large language models become more widely used, ensuring their safety and fairness across diverse linguistic contexts presents unique challenges. While existing research on machine unlearning has primarily focused on monolingual settings, typically English, multilingual environments introduce additional complexities due to cross-lingual knowledge transfer and biases embedded in both pretraining and fine-tuning data. In this work, we study multilingual unlearning using the Aya-Expanse 8B model under two settings: (1) data unlearning and (2) concept unlearning. We extend benchmarks for factual knowledge and stereotypes to ten languages through translation: English, French, Arabic, Japanese, Russian, Farsi, Korean, Hindi, Hebrew, and Indonesian. These languages span five language families and a wide range of resource levels. Our experiments show that unlearning in high-resource languages is generally more stable, with asymmetric transfer effects observed between typologically related languages. Furthermore, our analysis of linguistic distances indicates that syntactic similarity is the strongest predictor of cross-lingual unlearning behavior.