IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents

📅 2025-08-21

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Large language model (LLM) agents are vulnerable to indirect prompt injection (IPI) attacks when invoking untrusted external tools; existing defenses rely on the model’s internal safety assumptions and lack structural constraints on tool invocation behavior. Method: We propose a novel defense mechanism based on a Tool Dependency Graph (TDG), which decouples action planning from data interaction. A pre-defined TDG explicitly encodes task execution logic, and runtime tool invocations are strictly constrained to follow valid graph traversals. Contribution/Results: This approach breaks from traditional model-centric defense paradigms and introduces the first explicit, verifiable constraint on tool-call freedom. Evaluated on the AgentDojo benchmark, our method significantly reduces unintended tool invocations while achieving an optimal trade-off between effectiveness and robustness.

Technology Category

Application Category

📝 Abstract

Large language model (LLM) agents are widely deployed in real-world applications, where they leverage tools to retrieve and manipulate external data for complex tasks. However, when interacting with untrusted data sources (e.g., fetching information from public websites), tool responses may contain injected instructions that covertly influence agent behaviors and lead to malicious outcomes, a threat referred to as Indirect Prompt Injection (IPI). Existing defenses typically rely on advanced prompting strategies or auxiliary detection models. While these methods have demonstrated some effectiveness, they fundamentally rely on assumptions about the model's inherent security, which lacks structural constraints on agent behaviors. As a result, agents still retain unrestricted access to tool invocations, leaving them vulnerable to stronger attack vectors that can bypass the security guardrails of the model. To prevent malicious tool invocations at the source, we propose a novel defensive task execution paradigm, called IPIGuard, which models the agents' task execution process as a traversal over a planned Tool Dependency Graph (TDG). By explicitly decoupling action planning from interaction with external data, IPIGuard significantly reduces unintended tool invocations triggered by injected instructions, thereby enhancing robustness against IPI attacks. Experiments on the AgentDojo benchmark show that IPIGuard achieves a superior balance between effectiveness and robustness, paving the way for the development of safer agentic systems in dynamic environments.

Problem

Research questions and friction points this paper is trying to address.

Defending LLM agents against indirect prompt injection attacks

Preventing malicious tool invocations from untrusted data sources

Enhancing robustness while maintaining task execution effectiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tool Dependency Graph-based defense paradigm

Decouples action planning from external data

Reduces unintended tool invocations significantly

🔎 Similar Papers

Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models