Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current defense mechanisms for large language model agents predominantly rely on mandatory checks, lacking intrinsic awareness and selectivity, thereby struggling to balance security and efficiency. This work proposes Spider-Sense, a novel event-driven defense framework grounded in Intrinsic Risk Sensing (IRS), which activates a tiered response mechanism only upon risk detection. By integrating lightweight similarity matching with internal deep reasoning, Spider-Sense ensures robust security while significantly reducing computational overhead. Notably, the framework operates without external models and enables dynamic trade-offs between efficiency and accuracy. Evaluated on S²Bench—a newly introduced lifecycle-aware benchmark—Spider-Sense achieves the lowest attack success and false positive rates among existing approaches, introducing merely 8.3% latency overhead and demonstrating superior or comparable performance across the board.

Technology Category

Application Category

📝 Abstract
As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security challenges. Most existing agent defense mechanisms adopt a mandatory checking paradigm, in which security validation is forcibly triggered at predefined stages of the agent lifecycle. In this work, we argue that effective agent security should be intrinsic and selective rather than architecturally decoupled and mandatory. We propose Spider-Sense framework, an event-driven defense framework based on Intrinsic Risk Sensing (IRS), which allows agents to maintain latent vigilance and trigger defenses only upon risk perception. Once triggered, the Spider-Sense invokes a hierarchical defence mechanism that trades off efficiency and precision: it resolves known patterns via lightweight similarity matching while escalating ambiguous cases to deep internal reasoning, thereby eliminating reliance on external models. To facilitate rigorous evaluation, we introduce S$^2$Bench, a lifecycle-aware benchmark featuring realistic tool execution and multi-stage attacks. Extensive experiments demonstrate that Spider-Sense achieves competitive or superior defense performance, attaining the lowest Attack Success Rate (ASR) and False Positive Rate (FPR), with only a marginal latency overhead of 8.3\%.
Problem

Research questions and friction points this paper is trying to address.

agent defense
intrinsic risk sensing
security validation
autonomous agents
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Intrinsic Risk Sensing
Hierarchical Adaptive Screening
Event-driven Defense
LLM Agent Security
S²Bench
🔎 Similar Papers
No similar papers found.
Z
Zhenxiong Yu
SUFE
Z
Zhi Yang
SUFE
Z
Zhiheng Jin
SUFE
Shuhe Wang
Shuhe Wang
Peking University, University of Melbourne
Natural Language ProcessingMachine Learning
H
Heng Zhang
QuantaAlpha
Y
Yanlin Fei
CMU
Lingfeng Zeng
Lingfeng Zeng
上海财经大学
大语言模型
F
Fangqi Lou
SUFE
S
Shuo Zhang
QuantaAlpha
T
Tu Hu
QuantaAlpha
Jingping Liu
Jingping Liu
ECUST
large language modelknowledge graph
R
Rongze Chen
QuantaAlpha
Xingyu Zhu
Xingyu Zhu
Princeton University
Kunyi Wang
Kunyi Wang
UBC; KAUST
VisionGraphics
C
Chaofa Yuan
QuantaAlpha
Xin Guo
Xin Guo
上海财经大学
大语言模型,共形预测
Z
Zhaowei Liu
SUFE
F
Feipeng Zhang
XJTU
J
Jie Huang
SUFE
H
Huacan Wang
QuantaAlpha
R
Ronghao Chen
QuantaAlpha
L
Liwen Zhang
SUFE