Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work identifies a novel jailbreaking threat to large language models (LLMs) under task concurrency: adversaries can interleave benign and harmful requests at the token level, enabling intent separation and dynamic encoding to conceal malicious objectives and evade safety filters. To address this, we propose JAIL-CON—the first token-level concurrent jailbreaking framework—integrating task encoding, intent decoupling, and iterative optimization. Evaluated on mathematical and general QA benchmarks, JAIL-CON significantly increases jailbreaking success rates across mainstream LLMs while exhibiting stronger stealth than conventional sequential attacks, rendering it largely undetectable by existing defense systems. Our findings expose a previously overlooked security vulnerability in concurrent execution scenarios and advance LLM safety evaluation from single-task to multi-task paradigms.

Technology Category

Application Category

📝 Abstract

Despite their superior performance on a wide range of domains, large language models (LLMs) remain vulnerable to misuse for generating harmful content, a risk that has been further amplified by various jailbreak attacks. Existing jailbreak attacks mainly follow sequential logic, where LLMs understand and answer each given task one by one. However, concurrency, a natural extension of the sequential scenario, has been largely overlooked. In this work, we first propose a word-level method to enable task concurrency in LLMs, where adjacent words encode divergent intents. Although LLMs maintain strong utility in answering concurrent tasks, which is demonstrated by our evaluations on mathematical and general question-answering benchmarks, we notably observe that combining a harmful task with a benign one significantly reduces the probability of it being filtered by the guardrail, showing the potential risks associated with concurrency in LLMs. Based on these findings, we introduce $ exttt{JAIL-CON}$, an iterative attack framework that $underline{ ext{JAIL}}$breaks LLMs via task $underline{ ext{CON}}$currency. Experiments on widely-used LLMs demonstrate the strong jailbreak capabilities of $ exttt{JAIL-CON}$ compared to existing attacks. Furthermore, when the guardrail is applied as a defense, compared to the sequential answers generated by previous attacks, the concurrent answers in our $ exttt{JAIL-CON}$ exhibit greater stealthiness and are less detectable by the guardrail, highlighting the unique feature of task concurrency in jailbreaking LLMs.

Problem

Research questions and friction points this paper is trying to address.

Explores LLM vulnerability to concurrent task jailbreak attacks

Proposes word-level method combining harmful and benign intents

Demonstrates enhanced stealthiness against existing guardrail defenses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses word-level task concurrency method

Introduces iterative JAIL-CON attack framework

Generates stealthy concurrent answers bypassing guardrails

🔎 Similar Papers

Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation