Multi-turn Jailbreaking via Global Refinement and Active Fabrication

๐Ÿ“… 2025-06-21
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing jailbreaking attacks primarily focus on single-turn scenarios and lack adaptability to dynamic multi-turn dialogues. To address this limitation, we propose the first multi-turn jailbreaking framework capable of global path optimization and proactive response construction. Our method integrates gradient-based search, context-aware prompt optimization, and response forgery techniques, leveraging real-time model feedback to coordinate input adjustments across turnsโ€”thereby suppressing safety warnings and increasing the success rate of harmful content generation. Evaluated on six mainstream large language models, our approach significantly outperforms existing single-turn and multi-turn baseline methods, demonstrating superior persistence, stealth, and generalizability. The implementation is publicly available.

Technology Category

Application Category

๐Ÿ“ Abstract
Large Language Models (LLMs) have achieved exceptional performance across a wide range of tasks. However, they still pose significant safety risks due to the potential misuse for malicious purposes. Jailbreaks, which aim to elicit models to generate harmful content, play a critical role in identifying the underlying security threats. Recent jailbreaking primarily focuses on single-turn scenarios, while the more complicated multi-turn scenarios remain underexplored. Moreover, existing multi-turn jailbreaking techniques struggle to adapt to the evolving dynamics of dialogue as the interaction progresses. To address this limitation, we propose a novel multi-turn jailbreaking method that refines the jailbreaking path globally at each interaction. We also actively fabricate model responses to suppress safety-related warnings, thereby increasing the likelihood of eliciting harmful outputs in subsequent questions. Experimental results demonstrate the superior performance of our method compared with existing single-turn and multi-turn jailbreaking techniques across six state-of-the-art LLMs. Our code is publicly available at https://github.com/Ytang520/Multi-Turn_jailbreaking_Global-Refinment_and_Active-Fabrication.
Problem

Research questions and friction points this paper is trying to address.

Addressing multi-turn jailbreaking risks in LLMs
Improving jailbreaking path refinement globally
Suppressing safety warnings to elicit harmful outputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Global refinement of jailbreaking path
Active fabrication of model responses
Suppression of safety-related warnings
๐Ÿ”Ž Similar Papers
No similar papers found.