The Authorization-Execution Gap Is a Major Safety and Security Problem in Open-World Agents

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This study addresses the authorization-execution gap (AEG)—a critical safety risk in open-world agents arising from misalignment between granted intent and actual execution. The work systematically identifies three structural roots of AEG: incomplete delegation, channel contamination, and compositional fragmentation. Rather than relying on pre-execution filtering or post-hoc auditing, it proposes dynamically enforcing authorization integrity checks during execution. By integrating a dynamic diagnostic mechanism across multi-agent collaboration, tool invocation, and state persistence scenarios, this research introduces a novel evaluative dimension for open-world agent safety. It further advocates for reporting process-level evidence—including detection, constraint, and attribution of AEG—in scholarly investigations to enhance transparency and accountability.

📝 Abstract

This position paper argues that the Authorization-Execution Gap (AEG) is a major safety and security problem in open-world agents. The AEG is the divergence between what a principal intends to authorize and what an open-world agent ultimately executes. Because such agents act autonomously across tools, persistent state, and multi-agent handoffs, even small instances of authorization divergence can cause harm that is difficult or impossible to undo. We argue that many observed agent failures can be traced to three structural sources of AEG: delegation-level incompleteness, channel-level corruption, and composition-level fragmentation. The same observed failure may arise from any of these sources. Without identifying the source, a defense targeting the symptom alone cannot address the underlying cause. Agent safety and security should therefore emphasize source-oriented diagnosis and defense. Because the structural sources of AEG arise dynamically during execution, this approach necessarily requires authorization integrity checks applied during execution, rather than relying solely on one-shot upfront filtering or post-hoc audit. For NeurIPS, the implication is that papers on open-world agents should report not only outcome-level metrics such as task success or attack resistance, but also process-level evidence showing where AEG was detected, constrained, and attributed to a structural source during execution.

Problem

Research questions and friction points this paper is trying to address.

Authorization-Execution Gap

open-world agents

safety

security

authorization divergence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Authorization-Execution Gap

open-world agents

authorization integrity