ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work identifies a novel attack surface in LLM-based agents arising from their reliance on structured chat templates: adversaries can embed malicious instructions within external environment outputs and, through multi-turn dialogue, induce the agent to misinterpret forged templates as legitimate prompts—enabling indirect prompt injection. Unlike conventional prompt injection attacks, this study is the first to systematically investigate template dependency as a vulnerability, proposing a new injection method combining template obfuscation and multi-turn contextual persuasion. Experiments demonstrate attack success rates of 32.05% on AgentDojo and 45.90% on InjecAgent; the multi-turn variant achieves 52.33%. The attack exhibits strong cross-model transferability, while existing defenses—including input sanitization and guardrails—fail universally, exposing fundamental architectural security flaws in template-driven agent frameworks.

Technology Category

Application Category

📝 Abstract

The growing deployment of large language model (LLM) based agents that interact with external environments has created new attack surfaces for adversarial manipulation. One major threat is indirect prompt injection, where attackers embed malicious instructions in external environment output, causing agents to interpret and execute them as if they were legitimate prompts. While previous research has focused primarily on plain-text injection attacks, we find a significant yet underexplored vulnerability: LLMs' dependence on structured chat templates and their susceptibility to contextual manipulation through persuasive multi-turn dialogues. To this end, we introduce ChatInject, an attack that formats malicious payloads to mimic native chat templates, thereby exploiting the model's inherent instruction-following tendencies. Building on this foundation, we develop a persuasion-driven Multi-turn variant that primes the agent across conversational turns to accept and execute otherwise suspicious actions. Through comprehensive experiments across frontier LLMs, we demonstrate three critical findings: (1) ChatInject achieves significantly higher average attack success rates than traditional prompt injection methods, improving from 5.18% to 32.05% on AgentDojo and from 15.13% to 45.90% on InjecAgent, with multi-turn dialogues showing particularly strong performance at average 52.33% success rate on InjecAgent, (2) chat-template-based payloads demonstrate strong transferability across models and remain effective even against closed-source LLMs, despite their unknown template structures, and (3) existing prompt-based defenses are largely ineffective against this attack approach, especially against Multi-turn variants. These findings highlight vulnerabilities in current agent systems.

Problem

Research questions and friction points this paper is trying to address.

Exploiting chat templates for indirect prompt injection in LLMs

Using persuasive multi-turn dialogues to manipulate agent behavior

Demonstrating vulnerabilities in current LLM agent defense systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exploits chat templates for indirect prompt injection

Uses multi-turn dialogues to persuade LLM agents

Mimics native templates to bypass existing defenses

🔎 Similar Papers

No similar papers found.

Authors to Follow