HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models

📅 2025-10-21

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Large language models (LLMs) exhibit significant vulnerability to jailbreak attacks in multi-turn interactions. Method: This paper proposes HarmNet, an adaptive multi-turn attack framework grounded in semantic modeling and dynamic path optimization. It introduces (1) ThoughtNet—a hierarchical semantic network that explicitly models attacker intent and constraints; (2) a feedback simulator-driven iterative query optimization mechanism; and (3) a real-time adaptive network traversal strategy enabling dynamic evolution and fine-grained search of attack paths. Results: HarmNet achieves a 99.4% attack success rate on Mistral-7B, outperforming the best baseline by 13.9%. It consistently surpasses state-of-the-art methods across both open- and closed-source LLMs, demonstrating the effectiveness and generalizability of semantic-guided multi-turn attack paradigms.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) remain vulnerable to multi-turn jailbreak attacks. We introduce HarmNet, a modular framework comprising ThoughtNet, a hierarchical semantic network; a feedback-driven Simulator for iterative query refinement; and a Network Traverser for real-time adaptive attack execution. HarmNet systematically explores and refines the adversarial space to uncover stealthy, high-success attack paths. Experiments across closed-source and open-source LLMs show that HarmNet outperforms state-of-the-art methods, achieving higher attack success rates. For example, on Mistral-7B, HarmNet achieves a 99.4% attack success rate, 13.9% higher than the best baseline. Index terms: jailbreak attacks; large language models; adversarial framework; query refinement.

Problem

Research questions and friction points this paper is trying to address.

Addressing multi-turn jailbreak vulnerabilities in large language models

Developing adaptive framework to systematically explore adversarial attack paths

Improving attack success rates through iterative query refinement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical semantic network for multi-turn attacks

Feedback-driven simulator for iterative query refinement

Network traverser for real-time adaptive execution

🔎 Similar Papers

Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation