Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work identifies a cross-lingual backdoor attack vulnerability—termed X-BAT—in multilingual large language models (mLLMs), arising from shared embedding spaces: poisoning monolingual data induces automatic backdoor behavior transfer across languages. We propose a novel attack paradigm leveraging rare words as highly stealthy triggers and develop a poisoning framework grounded in toxic classification, integrating embedding space analysis, cross-lingual trigger word mining, and robustness evaluation. Extensive experiments across multiple state-of-the-art mLLMs demonstrate cross-lingual backdoor transfer success rates exceeding 85%. Crucially, this study provides the first systematic empirical evidence that embedding space sharing is the key mechanism enabling such cross-lingual backdoor migration, thereby offering a new perspective for security assessment of mLLMs. The code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract

We explore Cross-lingual Backdoor ATtacks (X-BAT) in multilingual Large Language Models (mLLMs), revealing how backdoors inserted in one language can automatically transfer to others through shared embedding spaces. Using toxicity classification as a case study, we demonstrate that attackers can compromise multilingual systems by poisoning data in a single language, with rare tokens serving as specific effective triggers. Our findings expose a critical vulnerability in the fundamental architecture that enables cross-lingual transfer in these models. Our code and data are publicly available at https://github.com/himanshubeniwal/X-BAT.

Problem

Research questions and friction points this paper is trying to address.

Cross-lingual backdoor attacks in multilingual LLMs

Toxicity classification vulnerability through single-language data poisoning

Shared embedding spaces enable automatic backdoor transfer

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-lingual Backdoor Attacks

Shared embedding spaces transfer

Rare tokens as triggers

🔎 Similar Papers

No similar papers found.

Authors to Follow