🤖 AI Summary
This work addresses the threat of indirect prompt injection in black-box chatbot environments, where attackers—despite lacking access to model weights or system prompts—can hijack agent tasks and exfiltrate user privacy. The authors propose a novel prompt injection technique termed “exemplification,” which reformulates user inputs together with externally retrieved content into few-shot examples to steer the model toward executing malicious instructions. This approach establishes, for the first time, an end-to-end privacy leakage attack chain under black-box conditions by integrating web tool invocation with prompt jailbreaking. In controlled experiments, the method successfully exfiltrated fabricated personal information in its entirety. Results demonstrate that exemplification substantially increases attack success rates, exposing critical privacy vulnerabilities inherent in current agent-based architectures.
📝 Abstract
LLM-based chatbot agents increasingly process user requests by combining natural-language reasoning with external tools such as web browsing. These capabilities improve usability, but they also create attack surfaces when untrusted external content is processed as part of a user' s task. This paper studies a privacy-leakage attack chain based on indirect prompt injection in black-box chatbot environments, where the attacker has no access to model weights, system prompts, or agent implementation details including how a trajectory is actually managed during its processing for a query. We first analyze how an attacker can hijack an agent' s intended task by crafting external content that appears benign to the victim while inducing the agent to execute an attacker-defined objective. We then evaluate a new prompt-injection technique, called exemplification, which uses a bridge in the external content to reframe the user prompt and the benign beginning of the retrieved page as few-shot examples before appending the attacker' s objective. We compare its attack success rate with a prior fake-completion technique. Finally, we demonstrate a proof-of-concept data-exfiltration chain using fictitious personal information in a controlled setting. Our results suggest that prompt injection, jailbreak-style instruction steering, and web-tool invocation can be combined into a feasible privacy-leakage path in deployed chatbot agents.