Analysis of LLMs Against Prompt Injection and Jailbreak Attacks

๐Ÿ“… 2026-02-24
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the vulnerability of large language models (LLMs) to prompt injection and jailbreaking attacks in deployment scenarios, which pose significant security risks. We construct a large-scale, human-annotated dataset of such attacks and present the first systematic evaluation of multiple open-source LLMsโ€™ susceptibility to them. Our analysis reveals notable differences in modelsโ€™ safety response behaviors, including tendencies toward silence or refusal, which we attribute to internal architectural mechanisms. Furthermore, we evaluate various lightweight, inference-time defense strategies that require no model retraining. While these defenses effectively mitigate simple attacks, they remain susceptible to evasion when confronted with long-context or high-complexity prompts, thereby highlighting critical limitations in current defensive approaches.

Technology Category

Application Category

๐Ÿ“ Abstract
Large Language Models (LLMs) are widely deployed in real-world systems. Given their broader applicability, prompt engineering has become an efficient tool for resource-scarce organizations to adopt LLMs for their own purposes. At the same time, LLMs are vulnerable to prompt-based attacks. Thus, analyzing this risk has become a critical security requirement. This work evaluates prompt-injection and jailbreak vulnerability using a large, manually curated dataset across multiple open-source LLMs, including Phi, Mistral, DeepSeek-R1, Llama 3.2, Qwen, and Gemma variants. We observe significant behavioural variation across models, including refusal responses and complete silent non-responsiveness triggered by internal safety mechanisms. Furthermore, we evaluated several lightweight, inference-time defence mechanisms that operate as filters without any retraining or GPU-intensive fine-tuning. Although these defences mitigate straightforward attacks, they are consistently bypassed by long, reasoning-heavy prompts.
Problem

Research questions and friction points this paper is trying to address.

prompt injection
jailbreak attacks
large language models
security vulnerability
adversarial prompts
Innovation

Methods, ideas, or system contributions that make the work stand out.

prompt injection
jailbreak attacks
LLM security
inference-time defense
safety mechanisms
๐Ÿ”Ž Similar Papers
No similar papers found.