🤖 AI Summary
This study systematically evaluates prompt injection vulnerabilities across eight mainstream commercial large language models (LLMs) in multimodal settings, covering four attack vectors: direct injection, indirect (external) injection, image-based injection, and prompt leakage. We introduce, for the first time, novel threat models specifically tailored to multimodal inputs—namely, external and image-based injection—and empirically assess the effectiveness of built-in safety mechanisms via black-box testing augmented with input normalization and other defensive mitigations. Results reveal that all evaluated models exhibit exploitable vulnerabilities; while Claude 3 demonstrates the strongest resilience, it still requires supplementary defenses—such as input normalization—to achieve robust protection. This work establishes the first large-scale, empirically grounded benchmark and a reproducible methodology framework for security assessment of multimodal LLMs.
📝 Abstract
Large Language Models (LLMs) have seen rapid adoption in recent years, with industries increasingly relying on them to maintain a competitive advantage. These models excel at interpreting user instructions and generating human-like responses, leading to their integration across diverse domains, including consulting and information retrieval. However, their widespread deployment also introduces substantial security risks, most notably in the form of prompt injection and jailbreak attacks.
To systematically evaluate LLM vulnerabilities -- particularly to external prompt injection -- we conducted a series of experiments on eight commercial models. Each model was tested without supplementary sanitization, relying solely on its built-in safeguards. The results exposed exploitable weaknesses and emphasized the need for stronger security measures. Four categories of attacks were examined: direct injection, indirect (external) injection, image-based injection, and prompt leakage. Comparative analysis indicated that Claude 3 demonstrated relatively greater robustness; nevertheless, empirical findings confirm that additional defenses, such as input normalization, remain necessary to achieve reliable protection.