Optimizing Agent Planning for Security and Autonomy

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the critical vulnerability of AI agents to indirect prompt injection attacks, which jeopardize the safe execution of high-stakes operations. While existing defenses enhance security, they often compromise task success rates and rely heavily on human intervention. To overcome these limitations, we propose a safety-aware agent architecture that explicitly harmonizes task progression with security constraints through integrated information flow control, policy-compliant planning, and an enhanced human-in-the-loop interaction mechanism. We introduce a novel metric—“autonomy”—to quantify an agent’s capability to safely perform critical actions without human oversight and optimize this metric via joint refinement of task and safety policies. Experimental results on the AgentDojo and WASP benchmarks demonstrate that our approach significantly improves system autonomy while preserving task utility.

Technology Category

Application Category

📝 Abstract

Indirect prompt injection attacks threaten AI agents that execute consequential actions, motivating deterministic system-level defenses. Such defenses can provably block unsafe actions by enforcing confidentiality and integrity policies, but currently appear costly: they reduce task completion rates and increase token usage compared to probabilistic defenses. We argue that existing evaluations miss a key benefit of system-level defenses: reduced reliance on human oversight. We introduce autonomy metrics to quantify this benefit: the fraction of consequential actions an agent can execute without human-in-the-loop (HITL) approval while preserving security. To increase autonomy, we design a security-aware agent that (i) introduces richer HITL interactions, and (ii) explicitly plans for both task progress and policy compliance. We implement this agent design atop an existing information-flow control defense against prompt injection and evaluate it on the AgentDojo and WASP benchmarks. Experiments show that this approach yields higher autonomy without sacrificing utility.

Problem

Research questions and friction points this paper is trying to address.

prompt injection

autonomy

human-in-the-loop

AI agent security

system-level defense

Innovation

Methods, ideas, or system contributions that make the work stand out.

autonomy metrics

security-aware agent

information-flow control