Analysis of LLMs Against Prompt Injection and Jailbreak Attacks

📅 2026-02-24

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This study addresses the vulnerability of large language models (LLMs) to prompt injection and jailbreaking attacks in deployment scenarios, which pose significant security risks. We construct a large-scale, human-annotated dataset of such attacks and present the first systematic evaluation of multiple open-source LLMs’ susceptibility to them. Our analysis reveals notable differences in models’ safety response behaviors, including tendencies toward silence or refusal, which we attribute to internal architectural mechanisms. Furthermore, we evaluate various lightweight, inference-time defense strategies that require no model retraining. While these defenses effectively mitigate simple attacks, they remain susceptible to evasion when confronted with long-context or high-complexity prompts, thereby highlighting critical limitations in current defensive approaches.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are widely deployed in real-world systems. Given their broader applicability, prompt engineering has become an efficient tool for resource-scarce organizations to adopt LLMs for their own purposes. At the same time, LLMs are vulnerable to prompt-based attacks. Thus, analyzing this risk has become a critical security requirement. This work evaluates prompt-injection and jailbreak vulnerability using a large, manually curated dataset across multiple open-source LLMs, including Phi, Mistral, DeepSeek-R1, Llama 3.2, Qwen, and Gemma variants. We observe significant behavioural variation across models, including refusal responses and complete silent non-responsiveness triggered by internal safety mechanisms. Furthermore, we evaluated several lightweight, inference-time defence mechanisms that operate as filters without any retraining or GPU-intensive fine-tuning. Although these defences mitigate straightforward attacks, they are consistently bypassed by long, reasoning-heavy prompts.

Problem

Research questions and friction points this paper is trying to address.

prompt injection

jailbreak attacks

large language models

security vulnerability

adversarial prompts

Innovation

Methods, ideas, or system contributions that make the work stand out.

prompt injection

jailbreak attacks

LLM security