Multilingual jailbreaking of LLMs using low-resource languages

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This study addresses the vulnerability of safety mechanisms in large language models (LLMs) when deployed in low-resource language settings, particularly within multi-turn dialogues. It presents the first systematic evaluation of jailbreak risks for African low-resource languages—such as Afrikaans and Swahili—against mainstream commercial LLMs, including ChatGPT, Claude, and Gemini. The assessment integrates prompt translation, multi-turn dialogue simulation, automated testing, and red-teaming by native speakers. Results demonstrate that multi-turn interactions substantially increase jailbreak success rates, rising on average from 59.8% to 75.8%, with Afrikaans exhibiting a notable 20.0% improvement. These findings highlight translation quality as a critical factor influencing jailbreak efficacy and underscore the fragility of current safety protocols in low-resource linguistic contexts.

📝 Abstract

Large Language Models (LLMs) remain vulnerable to jailbreak attempts that circumvent safety guardrails. We investigate whether multi-turn conversations using low-resource African languages (Afrikaans, Kiswahili, isiXhosa, and isiZulu) can bypass safety mechanisms across commercial LLMs. We translated prompts from existing datasets and evaluated ChatGPT, Claude, DeepSeek, Gemini, and Grok through automated testing and human red-teaming with native speakers. Single-turn translation attacks proved ineffective, while multi-turn conversations achieved English harmful response rates from 52.7% (Claude 3.5 Haiku) to 83.6% (GPT-4o-mini), Afrikaans from 60.0% (Claude 3.5 Haiku) to 78.2% (GPT-4o-mini), and Kiswahili from 41.8% (Claude 3.5 Haiku) to 70.9% (DeepSeek). Human red-teaming increased jailbreak rates compared to automated methods. Over all evaluated languages, the average jailbreak rate increased from 59.8% to 75.8%, with improvements of +20.0% (Afrikaans), +12.7% (isiZulu), +12.3% (isiXhosa), and +1% (Kiswahili), demonstrating that poor translation quality limits jailbreak success. These findings suggest that vulnerabilities in LLMs persist in multilingual contexts and that translation quality is the critical factor determining jailbreak success in low-resource languages.

Problem

Research questions and friction points this paper is trying to address.

jailbreaking

multilingual

low-resource languages

safety guardrails

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual jailbreaking

low-resource languages

LLM safety