AgentSentinel: An End-to-End and Real-Time Security Defense Framework for Computer-Use Agents

📅 2025-09-09

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Large language model (LLM)-driven computer-use agents pose novel security risks—including erroneous tool invocation and harmful operations—due to output instability. Method: This paper proposes an end-to-end, real-time safety defense framework. Its core innovations are: (1) a safety detection mechanism jointly analyzing task context and system behavioral trajectories; (2) BadComputerUse, the first benchmark for this setting, comprising 60 attack categories; and (3) fine-grained runtime auditing via integrated system-call monitoring, behavior tracing, context-aware analysis, and dynamic interception. Results: Evaluated on four mainstream LLMs, the framework achieves a mean defense success rate of 79.6%, significantly outperforming baseline methods, and successfully blocks malicious instruction execution in 87% of attack scenarios.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have been increasingly integrated into computer-use agents, which can autonomously operate tools on a user's computer to accomplish complex tasks. However, due to the inherently unstable and unpredictable nature of LLM outputs, they may issue unintended tool commands or incorrect inputs, leading to potentially harmful operations. Unlike traditional security risks stemming from insecure user prompts, tool execution results from LLM-driven decisions introduce new and unique security challenges. These vulnerabilities span across all components of a computer-use agent. To mitigate these risks, we propose AgentSentinel, an end-to-end, real-time defense framework designed to mitigate potential security threats on a user's computer. AgentSentinel intercepts all sensitive operations within agent-related services and halts execution until a comprehensive security audit is completed. Our security auditing mechanism introduces a novel inspection process that correlates the current task context with system traces generated during task execution. To thoroughly evaluate AgentSentinel, we present BadComputerUse, a benchmark consisting of 60 diverse attack scenarios across six attack categories. The benchmark demonstrates a 87% average attack success rate on four state-of-the-art LLMs. Our evaluation shows that AgentSentinel achieves an average defense success rate of 79.6%, significantly outperforming all baseline defenses.

Problem

Research questions and friction points this paper is trying to address.

Addresses unintended tool commands from LLMs in computer agents

Mitigates security risks from autonomous agent tool execution

Prevents harmful operations through real-time interception and auditing

Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end real-time defense framework

Intercepts sensitive operations for security audit

Correlates task context with system traces

🔎 Similar Papers

The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies