Beyond Static Sandboxing: Learned Capability Governance for Autonomous AI Agents

📅 2026-04-12

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the security risks and resource inefficiencies arising from autonomous AI agents being granted unrestricted access to all available tools by default. To mitigate these issues, the authors propose Aethelgard, a novel framework that introduces, for the first time, a learning-based minimal viable capability set strategy. Aethelgard employs a four-layer adaptive governance mechanism that integrates proximal policy optimization (PPO) reinforcement learning, hybrid rule-based and fine-tuned classifiers, runtime capability awareness, and on-the-fly tool invocation interception to dynamically allocate the minimal necessary permissions based on task requirements. Experimental results demonstrate that this approach substantially reduces capability over-provisioning, achieving precise, task-oriented capability governance while simultaneously enhancing operational efficiency and maintaining robust security guarantees.

Technology Category

Application Category

📝 Abstract

Autonomous AI agents built on open-source runtimes such as OpenClaw expose every available tool to every session by default, regardless of the task. A summarization task receives the same shell execution, subagent spawning, and credential access capabilities as a code deployment task, a 15x overprovision ratio that we call the capability overprovisioning problem. Existing defenses, including the NemoClaw container sandbox and the Cisco DefenseClaw skill scanner, address containment and threat detection but do not learn the minimum viable capability set for each task type. We present Aethelgard, a four layer adaptive governance framework that enforces least privilege for AI agents through a learned policy. Layer 1, the Capability Governor, dynamically scopes which tools the agent is aware of in each session. Layer 3, the Safety Router, intercepts tool calls before execution using a hybrid rule based and fine tuned classifier. Layer 2, the RL Learning Policy, trains a PPO policy on the accumulated audit log to learn the minimum viable skill set for each task type.

Problem

Research questions and friction points this paper is trying to address.

capability overprovisioning

autonomous AI agents

least privilege

tool access control

task-specific capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

learned capability governance

least privilege

autonomous AI agents