Multi-turn Jailbreaking via Global Refinement and Active Fabrication

📅 2025-06-21

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Existing jailbreaking attacks primarily focus on single-turn scenarios and lack adaptability to dynamic multi-turn dialogues. To address this limitation, we propose the first multi-turn jailbreaking framework capable of global path optimization and proactive response construction. Our method integrates gradient-based search, context-aware prompt optimization, and response forgery techniques, leveraging real-time model feedback to coordinate input adjustments across turns—thereby suppressing safety warnings and increasing the success rate of harmful content generation. Evaluated on six mainstream large language models, our approach significantly outperforms existing single-turn and multi-turn baseline methods, demonstrating superior persistence, stealth, and generalizability. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have achieved exceptional performance across a wide range of tasks. However, they still pose significant safety risks due to the potential misuse for malicious purposes. Jailbreaks, which aim to elicit models to generate harmful content, play a critical role in identifying the underlying security threats. Recent jailbreaking primarily focuses on single-turn scenarios, while the more complicated multi-turn scenarios remain underexplored. Moreover, existing multi-turn jailbreaking techniques struggle to adapt to the evolving dynamics of dialogue as the interaction progresses. To address this limitation, we propose a novel multi-turn jailbreaking method that refines the jailbreaking path globally at each interaction. We also actively fabricate model responses to suppress safety-related warnings, thereby increasing the likelihood of eliciting harmful outputs in subsequent questions. Experimental results demonstrate the superior performance of our method compared with existing single-turn and multi-turn jailbreaking techniques across six state-of-the-art LLMs. Our code is publicly available at https://github.com/Ytang520/Multi-Turn_jailbreaking_Global-Refinment_and_Active-Fabrication.

Problem

Research questions and friction points this paper is trying to address.

Addressing multi-turn jailbreaking risks in LLMs

Improving jailbreaking path refinement globally

Suppressing safety warnings to elicit harmful outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Global refinement of jailbreaking path

Active fabrication of model responses

Suppression of safety-related warnings

🔎 Similar Papers

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs