StructuralSleight: Automated Jailbreak Attacks on Large Language Models Utilizing Uncommon Text-Organization Structures

📅 2024-06-13
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Existing jailbreaking attacks against large language models (LLMs) largely overlook the role of prompt structural organization, focusing instead on lexical or syntactic manipulation. Method: This work systematically investigates how textual organization impacts jailbreaking efficacy and introduces Unconventional Textual Organization Structures (UTOS), grounded in long-tailed distribution principles. We design 12 UTOS templates integrated with six character- and context-coordinated obfuscation strategies, forming StructuralSleight—a three-level, fully automated, end-to-end jailbreaking framework that jointly obfuscates prompts across structural, character-level, and contextual granularities. Contribution/Results: StructuralSleight is the first jailbreaking tool targeting prompt structure as a primary attack surface. Evaluated on GPT-4o, it achieves a 94.62% attack success rate—significantly outperforming state-of-the-art baselines—and establishes a novel paradigm and benchmark for LLM safety evaluation.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are widely used in natural language processing but face the risk of jailbreak attacks that maliciously induce them to generate harmful content. Existing jailbreak attacks, including character-level and context-level attacks, mainly focus on the prompt of plain text without specifically exploring the significant influence of its structure. In this paper, we focus on studying how the prompt structure contributes to the jailbreak attack. We introduce a novel structure-level attack method based on long-tailed structures, which we refer to as Uncommon Text-Organization Structures (UTOS). We extensively study 12 UTOS templates and 6 obfuscation methods to build an effective automated jailbreak tool named StructuralSleight that contains three escalating attack strategies: Structural Attack, Structural and Character/Context Obfuscation Attack, and Fully Obfuscated Structural Attack. Extensive experiments on existing LLMs show that StructuralSleight significantly outperforms the baseline methods. In particular, the attack success rate reaches 94.62% on GPT-4o, which has not been addressed by state-of-the-art techniques.
Problem

Research questions and friction points this paper is trying to address.

Automated jailbreak attacks on LLMs
Utilizing uncommon text-organization structures
Enhancing attack success rate significantly
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncommon Text-Organization Structures
Automated jailbreak tool
Escalating attack strategies
🔎 Similar Papers
No similar papers found.
B
Bangxin Li
School of Computer Science and Technology, Xidian University, Xi’an, P.R. China
H
Hengrui Xing
School of Computer Science and Technology, Xidian University, Xi’an, P.R. China
Cong Tian
Cong Tian
Xidian University
Formal methodsProgram verificationSoftware engineering
C
Chao Huang
University of Southampton, Southampton, United Kingdom
J
Jin Qian
School of Computer Science and Technology, Xidian University, Xi’an, P.R. China
H
Huangqing Xiao
School of Computer Science and Technology, Xidian University, Xi’an, P.R. China
Linfeng Feng
Linfeng Feng
Northwestern Polytechnical University
Speech ProcessingMultimodal Learning