StructuralSleight: Automated Jailbreak Attacks on Large Language Models Utilizing Uncommon Text-Organization Structures

📅 2024-06-13

📈 Citations: 3

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Existing jailbreaking attacks against large language models (LLMs) largely overlook the role of prompt structural organization, focusing instead on lexical or syntactic manipulation. Method: This work systematically investigates how textual organization impacts jailbreaking efficacy and introduces Unconventional Textual Organization Structures (UTOS), grounded in long-tailed distribution principles. We design 12 UTOS templates integrated with six character- and context-coordinated obfuscation strategies, forming StructuralSleight—a three-level, fully automated, end-to-end jailbreaking framework that jointly obfuscates prompts across structural, character-level, and contextual granularities. Contribution/Results: StructuralSleight is the first jailbreaking tool targeting prompt structure as a primary attack surface. Evaluated on GPT-4o, it achieves a 94.62% attack success rate—significantly outperforming state-of-the-art baselines—and establishes a novel paradigm and benchmark for LLM safety evaluation.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are widely used in natural language processing but face the risk of jailbreak attacks that maliciously induce them to generate harmful content. Existing jailbreak attacks, including character-level and context-level attacks, mainly focus on the prompt of plain text without specifically exploring the significant influence of its structure. In this paper, we focus on studying how the prompt structure contributes to the jailbreak attack. We introduce a novel structure-level attack method based on long-tailed structures, which we refer to as Uncommon Text-Organization Structures (UTOS). We extensively study 12 UTOS templates and 6 obfuscation methods to build an effective automated jailbreak tool named StructuralSleight that contains three escalating attack strategies: Structural Attack, Structural and Character/Context Obfuscation Attack, and Fully Obfuscated Structural Attack. Extensive experiments on existing LLMs show that StructuralSleight significantly outperforms the baseline methods. In particular, the attack success rate reaches 94.62% on GPT-4o, which has not been addressed by state-of-the-art techniques.

Problem

Research questions and friction points this paper is trying to address.

Automated jailbreak attacks on LLMs

Utilizing uncommon text-organization structures

Enhancing attack success rate significantly

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncommon Text-Organization Structures

Automated jailbreak tool

Escalating attack strategies

🔎 Similar Papers

Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation