SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression

📅 2025-06-15

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

Large language models (LLMs) are vulnerable to jailbreaking prompt attacks, while existing defenses often compromise practicality or incur substantial computational overhead. Method: We propose a security-aware, lightweight prompt compression defense framework that preserves the user’s original input. Instead of modifying prompts, our approach introduces a novel intent-aware compressor to explicitly extract and represent latent malicious intent within prompts; this compressed intent representation is then injected—zero-intrusively—as a system prompt to activate the model’s built-in safety mechanisms. Technically, it integrates intent-discrimination fine-tuning, dual-path prompt injection, and an end-to-end trainable compression architecture. Results: On mainstream jailbreaking benchmarks, our method achieves >92% defense success rate, with <0.5% task performance degradation, <0.3% additional token overhead, and only +3ms online latency—significantly outperforming state-of-the-art defenses.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have achieved widespread adoption across numerous applications. However, many LLMs are vulnerable to malicious attacks even after safety alignment. These attacks typically bypass LLMs' safety guardrails by wrapping the original malicious instructions inside adversarial jailbreaks prompts. Previous research has proposed methods such as adversarial training and prompt rephrasing to mitigate these safety vulnerabilities, but these methods often reduce the utility of LLMs or lead to significant computational overhead and online latency. In this paper, we propose SecurityLingua, an effective and efficient approach to defend LLMs against jailbreak attacks via security-oriented prompt compression. Specifically, we train a prompt compressor designed to discern the"true intention"of the input prompt, with a particular focus on detecting the malicious intentions of adversarial prompts. Then, in addition to the original prompt, the intention is passed via the system prompt to the target LLM to help it identify the true intention of the request. SecurityLingua ensures a consistent user experience by leaving the original input prompt intact while revealing the user's potentially malicious intention and stimulating the built-in safety guardrails of the LLM. Moreover, thanks to prompt compression, SecurityLingua incurs only a negligible overhead and extra token cost compared to all existing defense methods, making it an especially practical solution for LLM defense. Experimental results demonstrate that SecurityLingua can effectively defend against malicious attacks and maintain utility of the LLM with negligible compute and latency overhead. Our code is available at https://aka.ms/SecurityLingua.

Problem

Research questions and friction points this paper is trying to address.

Defends LLMs against adversarial jailbreak attacks efficiently

Detects malicious intentions in prompts via compression

Minimizes computational overhead while maintaining LLM utility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Security-aware prompt compression for LLM defense

Training compressor to detect adversarial intentions

Minimal overhead with prompt compression technique

🔎 Similar Papers

Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks