The Trust Paradox in LLM-Based Multi-Agent Systems: When Collaboration Becomes a Security Vulnerability

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies and formalizes the “Trust–Vulnerability Paradox” (TVP) in large language model (LLM)-based multi-agent systems: increased inter-agent trust enhances collaborative performance yet concurrently exacerbates risks of sensitive information overexposure and privilege escalation. To address this, we propose a *principle of least necessary information security*, construct a comprehensive evaluation dataset covering three categories and 19 fine-grained scenarios, and define two unified quantitative metrics—overexposure rate and authorization drift. Leveraging a closed-loop interactive experimental framework, we parameterize trust mechanisms and integrate defensive strategies including sensitive information re-partitioning and guardian agents, systematically validating our approach across diverse LLM backends and orchestration architectures. Experiments reveal that high trust significantly improves task success rates but induces nonlinear growth in security risks, with strong heterogeneity in trust–risk mappings across systems. Our method effectively mitigates overexposure and authorization drift, establishing the first reproducible, quantifiable security assessment baseline for trustworthy multi-agent systems.

Technology Category

Application Category

📝 Abstract
Multi-agent systems powered by large language models are advancing rapidly, yet the tension between mutual trust and security remains underexplored. We introduce and empirically validate the Trust-Vulnerability Paradox (TVP): increasing inter-agent trust to enhance coordination simultaneously expands risks of over-exposure and over-authorization. To investigate this paradox, we construct a scenario-game dataset spanning 3 macro scenes and 19 sub-scenes, and run extensive closed-loop interactions with trust explicitly parameterized. Using Minimum Necessary Information (MNI) as the safety baseline, we propose two unified metrics: Over-Exposure Rate (OER) to detect boundary violations, and Authorization Drift (AD) to capture sensitivity to trust levels. Results across multiple model backends and orchestration frameworks reveal consistent trends: higher trust improves task success but also heightens exposure risks, with heterogeneous trust-to-risk mappings across systems. We further examine defenses such as Sensitive Information Repartitioning and Guardian-Agent enablement, both of which reduce OER and attenuate AD. Overall, this study formalizes TVP, establishes reproducible baselines with unified metrics, and demonstrates that trust must be modeled and scheduled as a first-class security variable in multi-agent system design.
Problem

Research questions and friction points this paper is trying to address.

Investigating the trust-vulnerability paradox in multi-agent LLM systems
Measuring over-exposure and authorization risks from inter-agent trust
Proposing defenses to mitigate security risks while maintaining collaboration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameterized trust modeling for security optimization
Over-Exposure Rate metric detects boundary violations
Sensitive Information Repartitioning reduces authorization risks