Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

📅 2026-05-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses emerging failure risks faced by autonomous agents in multi-step tasks, which undermine their trustworthiness in high-stakes applications. It presents the first comprehensive risk analysis framework centered on two critical dimensions: safety robustness and privacy-preserving system security. Spanning the agent’s full lifecycle, the framework integrates key techniques—including adversarial attack detection, constraint violation metrics, and trajectory integrity verification—through conceptual clarification, root-cause tracing, and stage-specific mitigation strategies. The study introduces a unified set of evaluation metrics and benchmarking protocols, yielding practical deployment guidelines for high-risk scenarios. Furthermore, it identifies pressing open challenges such as self-evolving agents, runtime monitoring, and personalized privacy preservation, validating the framework’s efficacy through an open-source repository of real-world safety failure cases.
📝 Abstract
Agentic AI systems -- Large Language Models (LLMs) augmented with planning, tool use, memory, and long-horizon interactions -- can execute complex tasks autonomously, but their multi-step trajectories introduce new failure modes that challenge trustworthiness. This survey provides a focused examination of trustworthy agentic AI through two core dimensions that are critical for high-risk deployments: Safety and Robustness, and Privacy and System Security. For each dimension, we clarify key concepts, identify where risks emerge along the agent workflow, and summarize stage-targeted mitigation strategies. Other trustworthiness aspects (value alignment, transparency, fairness, and accountability) are discussed as relevant context rather than parallel chapters. To support consistent comparison and deployment decisions, we consolidate evaluation into a unified metrics-and-benchmarks hub, emphasizing both outcome and process signals (e.g., constraint violations, trace completeness, and adversarial success rates) and offering scenario-to-metric guidance for release gating. We conclude by outlining open challenges such as self-evolving agents, runtime monitoring and verification, privacy-preserving personalization, and the trust-utility trade-off, and present a case study of real-world security failures in open-source agentic systems. Our goal is to serve as a practical reference for researchers and practitioners building trustworthy agentic systems in high-stakes environments.
Problem

Research questions and friction points this paper is trying to address.

agentic AI
trustworthiness
safety
robustness
privacy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic AI
Trustworthiness
Safety and Robustness
Privacy and Security
Evaluation Benchmarking
Jinhu Qi
Jinhu Qi
PhD candidate in CUHK CSE
Agentic AILLMsReasoning
Muzhi Li
Muzhi Li
The Chinese University of Hong Kong
Knowledge GraphNatural Language Processing
J
Jiahong Liu
Faculty of Engineering, Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China.
Y
Yuqin Shu
Faculty of Engineering, Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China.
D
Dianzhi Yu
Faculty of Engineering, Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China.
S
Shicheng Ma
Faculty of Engineering, Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China.
Wenqian Cui
Wenqian Cui
Chinese University of Hong Kong
Deep LearningNatural Language ProcessingLarge Language ModelsAI MusicMusic Generation
Yiyang Zhao
Yiyang Zhao
Ingdan Labs
Internet of ThingsMobile Computing
Yiyi Chen
Yiyi Chen
PhD Candidate, Aalborg University
Machine LearningDeep LearningNatural Language Processing
R
Ruoxi Jiang
Artificial Intelligence Innovation and Incubation Institute, Fudan University, Shanghai, China.; Shanghai Academy of AI for Science , Shanghai, China.
Irwin King
Irwin King
The Chinese University of Hong Kong
social computingmachine learningAIgraph neural networksNLP
Zenglin Xu
Zenglin Xu
Fudan University
Machine LearningTrustworthy AIFederated LearningLarge Language ModelsTime Series Analysis