A Safety and Security Framework for Real-World Agentic Systems

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This paper addresses novel safety and security risks—such as tool misuse, cascading action chains, and unintended control amplification—that arise in enterprise-grade agentic AI systems due to dynamic interactions among models, coordinators, tools, and data during real-world deployment. Methodologically, it establishes the first unified, dynamic safety-and-security framework for agentic systems, featuring a fine-grained risk taxonomy and a mechanistic analysis of safety-security coupling in dynamic execution. It introduces a closed-loop governance paradigm integrating auxiliary AI-enhanced risk perception, collaborative sandboxed execution, and AI-driven red-teaming. Evaluated end-to-end on the NVIDIA AI-Q Research Assistant platform, the framework identifies and mitigates over ten classes of emergent risks. Additionally, it releases an open-source benchmark dataset comprising 10,000+ adversarial and defensive agent trajectories, providing both theoretical foundations and empirical infrastructure for agentic AI safety research.

Technology Category

Application Category

📝 Abstract

This paper introduces a dynamic and actionable framework for securing agentic AI systems in enterprise deployment. We contend that safety and security are not merely fixed attributes of individual models but also emergent properties arising from the dynamic interactions among models, orchestrators, tools, and data within their operating environments. We propose a new way of identification of novel agentic risks through the lens of user safety. Although, for traditional LLMs and agentic models in isolation, safety and security has a clear separation, through the lens of safety in agentic systems, they appear to be connected. Building on this foundation, we define an operational agentic risk taxonomy that unifies traditional safety and security concerns with novel, uniquely agentic risks, including tool misuse, cascading action chains, and unintended control amplification among others. At the core of our approach is a dynamic agentic safety and security framework that operationalizes contextual agentic risk management by using auxiliary AI models and agents, with human oversight, to assist in contextual risk discovery, evaluation, and mitigation. We further address one of the most challenging aspects of safety and security of agentic systems: risk discovery through sandboxed, AI-driven red teaming. We demonstrate the framework effectiveness through a detailed case study of NVIDIA flagship agentic research assistant, AI-Q Research Assistant, showcasing practical, end-to-end safety and security evaluations in complex, enterprise-grade agentic workflows. This risk discovery phase finds novel agentic risks that are then contextually mitigated. We also release the dataset from our case study, containing traces of over 10,000 realistic attack and defense executions of the agentic workflow to help advance research in agentic safety.

Problem

Research questions and friction points this paper is trying to address.

Develops a framework for managing safety and security in enterprise AI agent systems

Identifies and classifies novel agentic risks like tool misuse and cascading failures

Demonstrates risk discovery and mitigation using AI-driven red teaming in workflows

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic framework for agentic AI security

Unified risk taxonomy with novel agentic risks

AI-driven red teaming for contextual risk mitigation

🔎 Similar Papers

Swiss Cheese Model for AI Safety: A Taxonomy and Reference Architecture for Multi-Layered Guardrails of Foundation Model Based Agents