Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses emerging failure risks faced by autonomous agents in multi-step tasks, which undermine their trustworthiness in high-stakes applications. It presents the first comprehensive risk analysis framework centered on two critical dimensions: safety robustness and privacy-preserving system security. Spanning the agent’s full lifecycle, the framework integrates key techniques—including adversarial attack detection, constraint violation metrics, and trajectory integrity verification—through conceptual clarification, root-cause tracing, and stage-specific mitigation strategies. The study introduces a unified set of evaluation metrics and benchmarking protocols, yielding practical deployment guidelines for high-risk scenarios. Furthermore, it identifies pressing open challenges such as self-evolving agents, runtime monitoring, and personalized privacy preservation, validating the framework’s efficacy through an open-source repository of real-world safety failure cases.

📝 Abstract

Agentic AI systems -- Large Language Models (LLMs) augmented with planning, tool use, memory, and long-horizon interactions -- can execute complex tasks autonomously, but their multi-step trajectories introduce new failure modes that challenge trustworthiness. This survey provides a focused examination of trustworthy agentic AI through two core dimensions that are critical for high-risk deployments: Safety and Robustness, and Privacy and System Security. For each dimension, we clarify key concepts, identify where risks emerge along the agent workflow, and summarize stage-targeted mitigation strategies. Other trustworthiness aspects (value alignment, transparency, fairness, and accountability) are discussed as relevant context rather than parallel chapters. To support consistent comparison and deployment decisions, we consolidate evaluation into a unified metrics-and-benchmarks hub, emphasizing both outcome and process signals (e.g., constraint violations, trace completeness, and adversarial success rates) and offering scenario-to-metric guidance for release gating. We conclude by outlining open challenges such as self-evolving agents, runtime monitoring and verification, privacy-preserving personalization, and the trust-utility trade-off, and present a case study of real-world security failures in open-source agentic systems. Our goal is to serve as a practical reference for researchers and practitioners building trustworthy agentic systems in high-stakes environments.

Problem

Research questions and friction points this paper is trying to address.

agentic AI

trustworthiness

safety

robustness

privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic AI

Trustworthiness

Safety and Robustness