Multimodal Prompt Injection Attacks: Risks and Defenses for Modern LLMs

📅 2025-09-06

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This study systematically evaluates prompt injection vulnerabilities across eight mainstream commercial large language models (LLMs) in multimodal settings, covering four attack vectors: direct injection, indirect (external) injection, image-based injection, and prompt leakage. We introduce, for the first time, novel threat models specifically tailored to multimodal inputs—namely, external and image-based injection—and empirically assess the effectiveness of built-in safety mechanisms via black-box testing augmented with input normalization and other defensive mitigations. Results reveal that all evaluated models exhibit exploitable vulnerabilities; while Claude 3 demonstrates the strongest resilience, it still requires supplementary defenses—such as input normalization—to achieve robust protection. This work establishes the first large-scale, empirically grounded benchmark and a reproducible methodology framework for security assessment of multimodal LLMs.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have seen rapid adoption in recent years, with industries increasingly relying on them to maintain a competitive advantage. These models excel at interpreting user instructions and generating human-like responses, leading to their integration across diverse domains, including consulting and information retrieval. However, their widespread deployment also introduces substantial security risks, most notably in the form of prompt injection and jailbreak attacks. To systematically evaluate LLM vulnerabilities -- particularly to external prompt injection -- we conducted a series of experiments on eight commercial models. Each model was tested without supplementary sanitization, relying solely on its built-in safeguards. The results exposed exploitable weaknesses and emphasized the need for stronger security measures. Four categories of attacks were examined: direct injection, indirect (external) injection, image-based injection, and prompt leakage. Comparative analysis indicated that Claude 3 demonstrated relatively greater robustness; nevertheless, empirical findings confirm that additional defenses, such as input normalization, remain necessary to achieve reliable protection.

Problem

Research questions and friction points this paper is trying to address.

Evaluating vulnerabilities of commercial LLMs to multimodal prompt injection attacks

Testing eight models' built-in safeguards against four attack categories

Identifying need for stronger defenses like input normalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated vulnerabilities via multimodal injection attacks

Tested eight commercial models without sanitization

Proposed input normalization for enhanced defense

🔎 Similar Papers

No similar papers found.