A Survey on Trustworthy LLM Agents: Threats and Countermeasures

📅 2025-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing emerging trustworthiness challenges in LLM-based agents and multi-agent systems (MAS)—novel threats extending beyond single-model reliability—this paper introduces TrustAgent, a foundational framework for trustworthy agent design. Methodologically, it establishes the first intrinsic/extrinsic dual-dimensional taxonomy of trustworthiness, formally defining the “trustworthy agent” paradigm; constructs the first unified taxonomy spanning attacks, defenses, and evaluation; and enables multi-dimensional trust modeling—covering robustness, explainability, alignment, and more—via component-level decomposition (i.e., brain, memory, tools, user, environment, interaction) with corresponding technical mappings. The framework yields the first comprehensive research landscape of LLM agent trustworthiness, offering a systematic guideline and theoretical foundation for designing, evaluating, and deploying trustworthy agents. (132 words)

Technology Category

Application Category

📝 Abstract
With the rapid evolution of Large Language Models (LLMs), LLM-based agents and Multi-agent Systems (MAS) have significantly expanded the capabilities of LLM ecosystems. This evolution stems from empowering LLMs with additional modules such as memory, tools, environment, and even other agents. However, this advancement has also introduced more complex issues of trustworthiness, which previous research focused solely on LLMs could not cover. In this survey, we propose the TrustAgent framework, a comprehensive study on the trustworthiness of agents, characterized by modular taxonomy, multi-dimensional connotations, and technical implementation. By thoroughly investigating and summarizing newly emerged attacks, defenses, and evaluation methods for agents and MAS, we extend the concept of Trustworthy LLM to the emerging paradigm of Trustworthy Agent. In TrustAgent, we begin by deconstructing and introducing various components of the Agent and MAS. Then, we categorize their trustworthiness into intrinsic (brain, memory, and tool) and extrinsic (user, agent, and environment) aspects. Subsequently, we delineate the multifaceted meanings of trustworthiness and elaborate on the implementation techniques of existing research related to these internal and external modules. Finally, we present our insights and outlook on this domain, aiming to provide guidance for future endeavors.
Problem

Research questions and friction points this paper is trying to address.

Addresses trustworthiness in LLM-based agents and Multi-agent Systems.
Proposes TrustAgent framework for modular taxonomy and technical implementation.
Investigates attacks, defenses, and evaluation methods for Trustworthy Agents.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular taxonomy for trustworthiness analysis
Multi-dimensional trustworthiness connotations explored
Technical implementation of TrustAgent framework
M
Miao Yu
Squirrel AI Learning
F
Fanci Meng
Squirrel AI Learning
X
Xinyun Zhou
Nanyang Technological University
S
Shilong Wang
Squirrel AI Learning
Junyuan Mao
Junyuan Mao
National University of Singapore
LLMAI AgentAI for Healthcare
L
Linsey Pang
Salesforce
Tianlong Chen
Tianlong Chen
Assistant Professor, CS@UNC Chapel Hill; Chief AI Scientist, hireEZ
Machine LearningAI4ScienceComputer VisionSparsity
K
Kun Wang
Nanyang Technological University
X
Xinfeng Li
Nanyang Technological University
Y
Yongfeng Zhang
Rutgers University
B
Bo An
Nanyang Technological University
Q
Qingsong Wen
Squirrel AI Learning