🤖 AI Summary
This work identifies a cross-lingual backdoor attack vulnerability—termed X-BAT—in multilingual large language models (mLLMs), arising from shared embedding spaces: poisoning monolingual data induces automatic backdoor behavior transfer across languages. We propose a novel attack paradigm leveraging rare words as highly stealthy triggers and develop a poisoning framework grounded in toxic classification, integrating embedding space analysis, cross-lingual trigger word mining, and robustness evaluation. Extensive experiments across multiple state-of-the-art mLLMs demonstrate cross-lingual backdoor transfer success rates exceeding 85%. Crucially, this study provides the first systematic empirical evidence that embedding space sharing is the key mechanism enabling such cross-lingual backdoor migration, thereby offering a new perspective for security assessment of mLLMs. The code and dataset are publicly released.
📝 Abstract
We explore Cross-lingual Backdoor ATtacks (X-BAT) in multilingual Large Language Models (mLLMs), revealing how backdoors inserted in one language can automatically transfer to others through shared embedding spaces. Using toxicity classification as a case study, we demonstrate that attackers can compromise multilingual systems by poisoning data in a single language, with rare tokens serving as specific effective triggers. Our findings expose a critical vulnerability in the fundamental architecture that enables cross-lingual transfer in these models. Our code and data are publicly available at https://github.com/himanshubeniwal/X-BAT.