Indirect Prompt Injection in the Wild: An Empirical Study of Prevalence, Techniques, and Objectives

📅 2026-04-29

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study addresses the susceptibility of large language models (LLMs) to indirect prompt injection attacks via web content, a threat whose real-world prevalence and impact remain poorly understood. Conducting a large-scale empirical analysis across 1.2 billion URLs, the authors systematically uncover the widespread presence of non-human-visible, AI-targeted instructions embedded in webpages, identifying and validating over 15,000 real-world injection instances. Through web crawling, automated detection, controlled experiments with 13 LLMs under four webpage representation formats, and compliance assessments, they find that approximately 70% of injections reside in invisible HTML regions. While smaller models exhibit instruction-following rates as high as 8% under plain-text input, structured webpage representations significantly mitigate this risk. The work further characterizes attacker objectives, propagation mechanisms, and real-world templates, providing an empirical foundation for effective defenses.

📝 Abstract

As LLMs are increasingly integrated into systems that browse, retrieve, summarize, and act on web content, webpages have become an untrusted input vector for downstream model behavior. This enables site owners, contributors, and adversaries to embed instructions directly in web resources, i.e., indirect prompt injections. While prior work demonstrates such attacks in controlled settings, their prevalence, deployment, and real-world impact remain unclear. We present one of the first large-scale empirical analyses of indirect prompt injections in webpages and HTTP responses. Analyzing 1.2B URLs from 24.8M hosts, we identify 15.3K validated instances across 11.7K pages. These are not isolated cases: a small number of recurring templates account for most cases. We characterize their objectives, delivery mechanisms, visibility, persistence, and impact, revealing a heterogeneous ecosystem spanning disruptive prompts, reputation manipulation, content-protection directives, and AI-bot detection, targeting systems such as crawlers, search pipelines, customer-support agents, and hiring workflows. A key finding is that most instructions target machines rather than humans: about 70% appear in non-rendered HTML (e.g., headers, comments, metadata), and many visible cases are hidden via rendering techniques. To assess practical risk, we run 5,200 controlled experiments across 13 models and four webpage representations. Our results show compliance is limited but non-negligible, reaching up to 8% for smaller models on plain-text inputs, while structured representations reduce compliance by preserving structural cues. Overall, prompt-based interference is already present in the web ecosystem and represents a growing source of tension between LLM-driven automation and the sites it consumes.

Problem

Research questions and friction points this paper is trying to address.

Indirect Prompt Injection

Large Language Models

Web Security

Prompt Interference

LLM Integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

indirect prompt injection

large language models

web security