AIRGuard: Guarding Agent Actions with Runtime Authority Control

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This work addresses the privilege escalation vulnerabilities in tool-augmented language agents caused by adversarial context manipulation by introducing the first runtime defense mechanism that enforces fine-grained authorization prior to action execution. The approach rigorously adheres to the principle of least privilege through tool-call normalization, dynamic derivation of task-level permissions down to individual steps, trust-source tracking, simulation of side effects for sensitive operations, and cross-step risk auditing, while explicitly distinguishing between reasoning inputs and action authorization. Experimental results demonstrate that the proposed mechanism reduces the attack success rate against Sonnet 4.6 from 36.3% to 5.5% on AgentTrap, and preserves 76.0% of benign functionality for Haiku 4.5 on DTAP-150—substantially outperforming ARGUS (52.0%) and MELON (42.0%).

📝 Abstract

Tool-using language agents turn model decisions into external side effects: they read files, run scripts, call APIs, send messages, and invoke Model Context Protocol tools. This makes agent attacks different from jailbreaks. The harmful step is often not an obviously forbidden output, but an ordinary executable action that becomes unsafe because attacker-controlled context steers authorized access against the user's interest. We identify this failure mode as authority confusion: untrusted resources may inform reasoning, but they must not authorize side effects. We present AIRGuard, a runtime guard that operationalizes least privilege as action-time authorization. AIRGuard normalizes heterogeneous tool calls, derives task authority into step-level authority, tracks source and target trust, simulates sensitive side effects, audits cross-step risk, and enforces decisions before actions execute. On AgentTrap, AIRGuard reduces Sonnet 4.6 attack success from 36.3% without defense to 5.5%. On DTAP-150, AIRGuard preserves 76.0% benign utility with Haiku 4.5, compared with 52.0% for ARGUS and 42.0% for MELON. An ablation further shows that prompt-only policy helps only modestly, whereas a dedicated runtime authority-control layer gives the agent system direct control over tool-mediated side effects. Code and data are available at https://github.com/Sophie508/AIRGuard.

Problem

Research questions and friction points this paper is trying to address.

tool-using agents

authority confusion

runtime authorization

side effects

security

Innovation

Methods, ideas, or system contributions that make the work stand out.

runtime authority control

least privilege

tool-using agents