Do Coding Agents Understand Least-Privilege Authorization?

📅 2026-05-14

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing coding agents struggle to autonomously infer the minimal permission boundaries required for a task, often leading to security risks from either insufficient or excessive authorization. This work formalizes, for the first time, the problem of permission boundary reasoning and introduces a decomposition-based policy generation framework that leverages a sufficiency–tightness decomposition to enhance authorization accuracy while ensuring task completion. We develop AuthBench, a benchmark integrating task instructions with terminal environments, which incorporates forward simulation, permission auditing, sensitivity analysis, and executable validation. Experimental results demonstrate that our approach improves success rates on sensitive tasks by up to 15.8% under tight preference models and significantly reduces attack success rates across diverse model architectures.

📝 Abstract

As coding agents gain access to shells, repositories, and user files, least-privilege authorization becomes a prerequisite for safe deployment: an agent should receive enough authority to complete the task, without unnecessary authority that exposes sensitive surfaces.To study whether current models can infer this boundary themselves, we first introduce permission-boundary inference, where a model maps a task instruction and terminal environment to a file-level read/write/execute policy, and AuthBench, a benchmark of 120 realistic terminal tasks with human-reviewed permission labels and executable validators for utility and attack outcomes.AuthBench shows that authorization is not a simple conservative-versus-permissive calibration problem: frontier models often omit permissions required by the execution chain while also granting unused or sensitive accesses.Increasing inference-time reasoning does not resolve this mismatch. Instead, each model moves toward a model-specific authorization attractor: more reasoning makes it more consistent in its own failure mode, whether broad-but-exposed or tight-but-brittle.This suggests that direct policy generation is the bottleneck, because a single generation must both discover all necessary accesses and reject all unnecessary ones.We therefore propose Sufficiency-Tightness Decomposition, which first generates a coverage-oriented policy by forward-simulating the task and then audits each granted entry for grounding and sensitivity.Across tested models, this decomposition improves sensitive-task success by up to 15.8% on tightness-biased models while reducing attack success across all evaluated models.

Problem

Research questions and friction points this paper is trying to address.

least-privilege authorization

permission-boundary inference

coding agents

AuthBench

security

Innovation

Methods, ideas, or system contributions that make the work stand out.

least-privilege authorization

permission-boundary inference

Sufficiency-Tightness Decomposition