Tracking Capabilities for Safer Agents

📅 2026-03-01

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

This work proposes a capability-based security framework implemented in Scala 3 to mitigate safety risks—such as privacy leakage, unintended side effects, and prompt injection—that arise when AI agents interact with the real world through tool invocation. Leveraging Scala 3’s type system with capture checking, the framework enables static control over resource access and computational effects. By integrating capability tracking with localized purity verification, it effectively prevents sensitive data exfiltration and malicious side effects while preserving agent task performance with negligible overhead. Experimental evaluation demonstrates that the extensible framework reliably blocks a range of unsafe behaviors, offering formal, provable security guarantees for AI agents operating in complex environments.

Technology Category

Application Category

📝 Abstract

AI agents that interact with the real world through tool calls pose fundamental safety challenges: agents might leak private information, cause unintended side effects, or be manipulated through prompt injection. To address these challenges, we propose to put the agent in a programming-language-based "safety harness": instead of calling tools directly, agents express their intentions as code in a capability-safe language: Scala 3 with capture checking. Capabilities are program variables that regulate access to effects and resources of interest. Scala's type system tracks capabilities statically, providing fine-grained control over what an agent can do. In particular, it enables local purity, the ability to enforce that sub-computations are side-effect-free, preventing information leakage when agents process classified data. We demonstrate that extensible agent safety harnesses can be built by leveraging a strong type system with tracked capabilities. Our experiments show that agents can generate capability-safe code with no significant loss in task performance, while the type system reliably prevents unsafe behaviors such as information leakage and malicious side effects.

Problem

Research questions and friction points this paper is trying to address.

AI safety

tool use

information leakage

prompt injection

side effects

Innovation

Methods, ideas, or system contributions that make the work stand out.

capability safety

capture checking

local purity