Chain-of-Trigger: An Agentic Backdoor that Paradoxically Enhances Agentic Robustness

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This work exposes a critical security vulnerability in large language model (LLM)-based intelligent agents: susceptibility to multi-step, long-horizon backdoor attacks in realistic, interactive environments. To this end, we propose Chain-of-Trigger (CoTri), the first backdoor framework enabling dynamic, environment-aware multi-step control. CoTri models environmental stochasticity to design an ordered sequence of triggers, leveraging agent feedback to progressively steer behavior away from the intended task. Counterintuitively, the backdoor simultaneously enhances the agent’s robustness and performance on clean tasks—thereby significantly improving stealth. Extensive experiments demonstrate that CoTri achieves near-perfect attack success rates (~100%) with negligible false-positive triggering across both LLM-based and vision-language model agents. Moreover, it exhibits strong cross-modal generalizability. CoTri establishes a new paradigm for agent security evaluation and provides a potent, realistic benchmark for assessing backdoor resilience in autonomous agents.

Technology Category

Application Category

📝 Abstract

The rapid deployment of large language model (LLM)-based agents in real-world applications has raised serious concerns about their trustworthiness. In this work, we reveal the security and robustness vulnerabilities of these agents through backdoor attacks. Distinct from traditional backdoors limited to single-step control, we propose the Chain-of-Trigger Backdoor (CoTri), a multi-step backdoor attack designed for long-horizon agentic control. CoTri relies on an ordered sequence. It starts with an initial trigger, and subsequent ones are drawn from the environment, allowing multi-step manipulation that diverts the agent from its intended task. Experimental results show that CoTri achieves a near-perfect attack success rate (ASR) while maintaining a near-zero false trigger rate (FTR). Due to training data modeling the stochastic nature of the environment, the implantation of CoTri paradoxically enhances the agent's performance on benign tasks and even improves its robustness against environmental distractions. We further validate CoTri on vision-language models (VLMs), confirming its scalability to multimodal agents. Our work highlights that CoTri achieves stable, multi-step control within agents, improving their inherent robustness and task capabilities, which ultimately makes the attack more stealthy and raises potential safty risks.

Problem

Research questions and friction points this paper is trying to address.

Proposes multi-step backdoor attack for long-horizon agentic control

Reveals security vulnerabilities in LLM-based agents through backdoors

Achieves high attack success while enhancing benign task performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-step backdoor attack for agentic control

Ordered trigger sequence from environment interactions

Paradoxically enhances agent robustness and performance

🔎 Similar Papers

No similar papers found.