Multi-Turn Jailbreaking of Aligned LLMs via Lexical Anchor Tree Search

๐Ÿ“… 2026-01-06
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limitations of existing jailbreaking methods, which rely on attacker-side large language models (LLMs) to generate adversarial queriesโ€”suffering from high computational cost, excessive query counts, and poor interpretability. The authors reformulate jailbreaking as a breadth-first tree search within multi-turn dialogues and propose a vocabulary anchor injection mechanism that operates without an auxiliary attacker LLM. By iteratively embedding target keywords into benign prompts across dialogue rounds, the method exploits the multi-turn conversation structure as a novel attack surface. Evaluated on AdvBench and HarmBench, the approach achieves 97โ€“100% attack success rates against GPT, Claude, and Llama-family models with an average of only 6.4 queries, substantially outperforming current state-of-the-art techniques.

Technology Category

Application Category

๐Ÿ“ Abstract
Most jailbreak methods achieve high attack success rates (ASR) but require attacker LLMs to craft adversarial queries and/or demand high query budgets. These resource limitations make jailbreaking expensive, and the queries generated by attacker LLMs often consist of non-interpretable random prefixes. This paper introduces Lexical Anchor Tree Search (), addressing these limitations through an attacker-LLM-free method that operates purely via lexical anchor injection. LATS reformulates jailbreaking as a breadth-first tree search over multi-turn dialogues, where each node incrementally injects missing content words from the attack goal into benign prompts. Evaluations on AdvBench and HarmBench demonstrate that LATS achieves 97-100% ASR on latest GPT, Claude, and Llama models with an average of only ~6.4 queries, compared to 20+ queries required by other methods. These results highlight conversational structure as a potent and under-protected attack surface, while demonstrating superior query efficiency in an era where high ASR is readily achievable. Our code will be released to support reproducibility.
Problem

Research questions and friction points this paper is trying to address.

jailbreak
aligned LLMs
adversarial queries
query efficiency
multi-turn dialogue
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lexical Anchor Tree Search
jailbreaking
multi-turn dialogue
query efficiency
aligned LLMs
๐Ÿ”Ž Similar Papers
No similar papers found.