Automating Agent Hijacking via Structural Template Injection

📅 2026-02-18

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the security threat of large language model (LLM) agents being hijacked through malicious instructions injected via retrieval-augmented content. Existing approaches rely on manual prompt engineering, resulting in low attack success rates and poor transferability to closed-source models. To overcome these limitations, we propose Phantom, a novel framework that introduces the first automated hijacking method based on structured templates. Phantom exploits role delimiters to induce model confusion, causing malicious payloads to be misinterpreted as legitimate instructions or tool outputs. We further design a Template AutoEncoder (TAE) to map discrete templates into a continuous, searchable space, enabling efficient black-box attacks through Bayesian optimization and multi-level template augmentation. Extensive evaluations on Qwen, GPT, and Gemini demonstrate significant improvements over baselines in both attack success rate and query efficiency. Notably, our method has uncovered confirmed security vulnerabilities in over 70 real-world commercial LLM applications.

Technology Category

Application Category

📝 Abstract

Agent hijacking, highlighted by OWASP as a critical threat to the Large Language Model (LLM) ecosystem, enables adversaries to manipulate execution by injecting malicious instructions into retrieved content. Most existing attacks rely on manually crafted, semantics-driven prompt manipulation, which often yields low attack success rates and limited transferability to closed-source commercial models. In this paper, we propose Phantom, an automated agent hijacking framework built upon Structured Template Injection that targets the fundamental architectural mechanisms of LLM agents. Our key insight is that agents rely on specific chat template tokens to separate system, user, assistant, and tool instructions. By injecting optimized structured templates into the retrieved context, we induce role confusion and cause the agent to misinterpret the injected content as legitimate user instructions or prior tool outputs. To enhance attack transferability against black-box agents, Phantom introduces a novel attack template search framework. We first perform multi-level template augmentation to increase structural diversity and then train a Template Autoencoder (TAE) to embed discrete templates into a continuous, searchable latent space. Subsequently, we apply Bayesian optimization to efficiently identify optimal adversarial vectors that are decoded into high-potency structured templates. Extensive experiments on Qwen, GPT, and Gemini demonstrate that our framework significantly outperforms existing baselines in both Attack Success Rate (ASR) and query efficiency. Moreover, we identified over 70 vulnerabilities in real-world commercial products that have been confirmed by vendors, underscoring the practical severity of structured template-based hijacking and providing an empirical foundation for securing next-generation agentic systems.

Problem

Research questions and friction points this paper is trying to address.

Agent hijacking

Large Language Model

Structured Template Injection

Adversarial Attack

Prompt Injection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured Template Injection

Agent Hijacking

Template Autoencoder