๐ค AI Summary
Existing jailbreaking attacks primarily focus on single-turn scenarios and lack adaptability to dynamic multi-turn dialogues. To address this limitation, we propose the first multi-turn jailbreaking framework capable of global path optimization and proactive response construction. Our method integrates gradient-based search, context-aware prompt optimization, and response forgery techniques, leveraging real-time model feedback to coordinate input adjustments across turnsโthereby suppressing safety warnings and increasing the success rate of harmful content generation. Evaluated on six mainstream large language models, our approach significantly outperforms existing single-turn and multi-turn baseline methods, demonstrating superior persistence, stealth, and generalizability. The implementation is publicly available.
๐ Abstract
Large Language Models (LLMs) have achieved exceptional performance across a wide range of tasks. However, they still pose significant safety risks due to the potential misuse for malicious purposes. Jailbreaks, which aim to elicit models to generate harmful content, play a critical role in identifying the underlying security threats. Recent jailbreaking primarily focuses on single-turn scenarios, while the more complicated multi-turn scenarios remain underexplored. Moreover, existing multi-turn jailbreaking techniques struggle to adapt to the evolving dynamics of dialogue as the interaction progresses. To address this limitation, we propose a novel multi-turn jailbreaking method that refines the jailbreaking path globally at each interaction. We also actively fabricate model responses to suppress safety-related warnings, thereby increasing the likelihood of eliciting harmful outputs in subsequent questions. Experimental results demonstrate the superior performance of our method compared with existing single-turn and multi-turn jailbreaking techniques across six state-of-the-art LLMs. Our code is publicly available at https://github.com/Ytang520/Multi-Turn_jailbreaking_Global-Refinment_and_Active-Fabrication.