🤖 AI Summary
As AI agents gain increasing autonomy, security threats—particularly prompt injection—pose growing risks to agent integrity and confidentiality. Method: This paper introduces Fides, the first information-flow control (IFC) framework specifically designed for AI agent planners. It integrates dynamic taint tracking, confidentiality/integrity labels, and policy-enforcement mechanisms. Its core innovations include: (1) a formal IFC model tailored to planning processes; (2) security primitives enabling selective information hiding; and (3) a task taxonomy jointly optimizing security guarantees and functional utility. Contribution/Results: Implemented as an open-source secure planner, Fides significantly expands the set of tasks safely executable under strong, formal security guarantees—demonstrated via rigorous evaluation on the AgentDojo benchmark—while maintaining practical performance and usability.
📝 Abstract
As AI agents become increasingly autonomous and capable, ensuring their security against vulnerabilities such as prompt injection becomes critical. This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents. We present a formal model to reason about the security and expressiveness of agent planners. Using this model, we characterize the class of properties enforceable by dynamic taint-tracking and construct a taxonomy of tasks to evaluate security and utility trade-offs of planner designs. Informed by this exploration, we present Fides, a planner that tracks confidentiality and integrity labels, deterministically enforces security policies, and introduces novel primitives for selectively hiding information. Its evaluation in AgentDojo demonstrates that this approach broadens the range of tasks that can be securely accomplished. A tutorial to walk readers through the the concepts introduced in the paper can be found at https://github.com/microsoft/fides