HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models

πŸ“… 2025-10-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Large language models (LLMs) exhibit significant vulnerability to jailbreak attacks in multi-turn interactions. Method: This paper proposes HarmNet, an adaptive multi-turn attack framework grounded in semantic modeling and dynamic path optimization. It introduces (1) ThoughtNetβ€”a hierarchical semantic network that explicitly models attacker intent and constraints; (2) a feedback simulator-driven iterative query optimization mechanism; and (3) a real-time adaptive network traversal strategy enabling dynamic evolution and fine-grained search of attack paths. Results: HarmNet achieves a 99.4% attack success rate on Mistral-7B, outperforming the best baseline by 13.9%. It consistently surpasses state-of-the-art methods across both open- and closed-source LLMs, demonstrating the effectiveness and generalizability of semantic-guided multi-turn attack paradigms.

Technology Category

Application Category

πŸ“ Abstract
Large Language Models (LLMs) remain vulnerable to multi-turn jailbreak attacks. We introduce HarmNet, a modular framework comprising ThoughtNet, a hierarchical semantic network; a feedback-driven Simulator for iterative query refinement; and a Network Traverser for real-time adaptive attack execution. HarmNet systematically explores and refines the adversarial space to uncover stealthy, high-success attack paths. Experiments across closed-source and open-source LLMs show that HarmNet outperforms state-of-the-art methods, achieving higher attack success rates. For example, on Mistral-7B, HarmNet achieves a 99.4% attack success rate, 13.9% higher than the best baseline. Index terms: jailbreak attacks; large language models; adversarial framework; query refinement.
Problem

Research questions and friction points this paper is trying to address.

Addressing multi-turn jailbreak vulnerabilities in large language models
Developing adaptive framework to systematically explore adversarial attack paths
Improving attack success rates through iterative query refinement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical semantic network for multi-turn attacks
Feedback-driven simulator for iterative query refinement
Network traverser for real-time adaptive execution
πŸ”Ž Similar Papers
No similar papers found.
S
Sidhant Narula
Old Dominion University, Norfolk, VA, USA
J
J. Asl
Old Dominion University, Norfolk, VA, USA
M
Mohammad Ghasemigol
Old Dominion University, Norfolk, VA, USA
Eduardo Blanco
Eduardo Blanco
University of Arizona
Natural language processingComputational semantics
Daniel Takabi
Daniel Takabi
Professor and Director of School of Cybersecurity, Old Dominion University
Trustworthy AIInformation Security & PrivacyUsable Security and Privacy