How Brittle is Agent Safety? Rethinking Agent Risk under Intent Concealment and Task Complexity

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing safety evaluations for LLM-based agents focus predominantly on atomic harms, failing to detect sophisticated threats arising from the interplay of concealed malicious intent and task complexity—leading to unwarranted confidence in agent safety. Method: We propose a two-dimensional “safety brittleness” analytical framework that exposes the “complexity paradox”: alignment performance degrades sharply as task complexity increases and intent becomes more covert—not due to robust alignment, but rather to inherent capability limitations. Building on this, we introduce OASIS, a novel benchmark featuring hierarchical task design, fine-grained harm annotations, and orthogonal stress testing, coupled with a high-fidelity simulation sandbox that enables systematic injection of covert adversarial intents. Contribution/Results: OASIS supports joint quantitative and phenomenological safety assessment for the first time. Empirical results confirm the nonlinear deterioration of safety robustness under increasing concealment and complexity, advancing agent safety evaluation toward realistic, high-stakes operational scenarios.

Technology Category

Application Category

📝 Abstract
Current safety evaluations for LLM-driven agents primarily focus on atomic harms, failing to address sophisticated threats where malicious intent is concealed or diluted within complex tasks. We address this gap with a two-dimensional analysis of agent safety brittleness under the orthogonal pressures of intent concealment and task complexity. To enable this, we introduce OASIS (Orthogonal Agent Safety Inquiry Suite), a hierarchical benchmark with fine-grained annotations and a high-fidelity simulation sandbox. Our findings reveal two critical phenomena: safety alignment degrades sharply and predictably as intent becomes obscured, and a"Complexity Paradox"emerges, where agents seem safer on harder tasks only due to capability limitations. By releasing OASIS and its simulation environment, we provide a principled foundation for probing and strengthening agent safety in these overlooked dimensions.
Problem

Research questions and friction points this paper is trying to address.

Evaluating agent safety brittleness under concealed malicious intent
Assessing safety degradation when intent is obscured in complex tasks
Investigating the Complexity Paradox where capability limits mask true risks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Orthogonal safety analysis with intent concealment
Hierarchical benchmark with fine-grained annotations
High-fidelity simulation sandbox for evaluation
🔎 Similar Papers
Zihan Ma
Zihan Ma
Xi'an Jiaotong University
NLPSocial NetworkMulti Modal Learning
D
Dongsheng Zhu
Shanghai AI Laboratory
Shudong Liu
Shudong Liu
University of Macau
Natural Language ProcessingLarge Language Models
Taolin Zhang
Taolin Zhang
Hefei University of Technology
LLMVLLMDeep Learning
J
Junnan Liu
Shanghai AI Laboratory
Q
Qingqiu Li
Shanghai AI Laboratory
Minnan Luo
Minnan Luo
Professor, Xi'an Jiaotong University
S
Songyang Zhang
Shanghai AI Laboratory
K
Kai Chen
Shanghai AI Laboratory