Security Considerations for Artificial Intelligence Agents

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

career value

277K/year

🤖 AI Summary

This work addresses emerging confidentiality, integrity, and availability risks introduced by advanced AI agents that transcend traditional security boundaries through capabilities such as tool use and multi-agent collaboration. It presents the first systematic identification of agent-specific security failure modes, including indirect prompt injection, behavioral obfuscation, and long-horizon cascading failures. To mitigate these threats, the paper proposes a layered defense architecture aligned with the NIST Risk Management Framework, integrating input filtering, model-level safeguards, sandboxed execution, and deterministic policy enforcement for high-risk operations. The study delineates the effectiveness boundaries of current defenses and outlines critical research directions, including adaptive security benchmarks, delegation mechanisms, and principled access control models tailored to autonomous agent ecosystems.

Technology Category

Application Category

📝 Abstract

This article, a lightly adapted version of Perplexity's response to NIST/CAISI Request for Information 2025-0035, details our observations and recommendations concerning the security of frontier AI agents. These insights are informed by Perplexity's experience operating general-purpose agentic systems used by millions of users and thousands of enterprises in both controlled and open-world environments. Agent architectures change core assumptions around code-data separation, authority boundaries, and execution predictability, creating new confidentiality, integrity, and availability failure modes. We map principal attack surfaces across tools, connectors, hosting boundaries, and multi-agent coordination, with particular emphasis on indirect prompt injection, confused-deputy behavior, and cascading failures in long-running workflows. We then assess current defenses as a layered stack: input-level and model-level mitigations, sandboxed execution, and deterministic policy enforcement for high-consequence actions. Finally, we identify standards and research gaps, including adaptive security benchmarks, policy models for delegation and privilege control, and guidance for secure multi-agent system design aligned with NIST risk management principles.

Problem

Research questions and friction points this paper is trying to address.

AI agents

security

attack surfaces

confidentiality

integrity

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI agent security

indirect prompt injection

confused-deputy attacks