Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Large language models (LLMs) exhibit persistent structural safety vulnerabilities under multi-turn jailbreaking attacks, yet existing approaches lack a systematic understanding of how conversational patterns correlate with model weaknesses. Method: We propose Pattern Enhanced Chain of Attack (PE-CoA), the first framework to systematically distill five reusable dialogue attack patterns—e.g., pedagogical discussion and hypothetical scenario framing—and uncover their mappings to cross-harm vulnerabilities (e.g., malware generation, fraud). PE-CoA leverages multi-turn dialogue modeling and pattern-driven attack generation. Contribution/Results: Evaluated on 12 mainstream LLMs across 10 harm categories, PE-CoA achieves state-of-the-art attack success rates. We empirically demonstrate that model vulnerabilities are not generalizable across patterns; instead, models within the same architecture family exhibit consistent failure modes—revealing critical limitations of current safety fine-tuning. Our findings establish a theoretical foundation and empirical basis for pattern-aware defense mechanisms.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) remain vulnerable to multi-turn jailbreaking attacks that exploit conversational context to bypass safety constraints gradually. These attacks target different harm categories (like malware generation, harassment, or fraud) through distinct conversational approaches (educational discussions, personal experiences, hypothetical scenarios). Existing multi-turn jailbreaking methods often rely on heuristic or ad hoc exploration strategies, providing limited insight into underlying model weaknesses. The relationship between conversation patterns and model vulnerabilities across harm categories remains poorly understood. We propose Pattern Enhanced Chain of Attack (PE-CoA), a framework of five conversation patterns to construct effective multi-turn jailbreaks through natural dialogue. Evaluating PE-CoA on twelve LLMs spanning ten harm categories, we achieve state-of-the-art performance, uncovering pattern-specific vulnerabilities and LLM behavioral characteristics: models exhibit distinct weakness profiles where robustness to one conversational pattern does not generalize to others, and model families share similar failure modes. These findings highlight limitations of safety training and indicate the need for pattern-aware defenses. Code available on: https://github.com/Ragib-Amin-Nihal/PE-CoA

Problem

Research questions and friction points this paper is trying to address.

Exploiting structural vulnerabilities in LLMs through multi-turn jailbreaking attacks

Understanding relationship between conversation patterns and model vulnerabilities

Developing pattern-aware defenses against safety constraint bypass techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes Pattern Enhanced Chain of Attack framework

Uses five conversation patterns for jailbreaking

Achieves state-of-art performance across models

🔎 Similar Papers

Revisiting Jailbreaking for Large Language Models: A Representation Engineering Perspective