Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

221K/year
🤖 AI Summary
This work addresses the underexplored security risks faced by agent-based language models operating in dynamic execution contexts—such as files, memory, and external tools—where vulnerabilities may arise beyond user prompts and remain undetected by conventional evaluation methods that focus solely on final outputs. To tackle this, the authors propose DeepTrap, a novel framework that formalizes contextual safety threats as a black-box, multi-objective trajectory optimization problem. DeepTrap integrates risk-conditioned assessment, multi-objective scoring, reward-guided beam search, and reflective deep probing to automatically uncover high-impact vulnerable contexts. Evaluated on a benchmark of 42 cases spanning six vulnerability types and seven scenarios across nine mainstream models, the experiments demonstrate that context manipulation can reliably induce unsafe behaviors while preserving task completion capabilities, thereby exposing critical limitations in current safety evaluation paradigms.
📝 Abstract
Agentic language-model systems increasingly rely on mutable execution contexts, including files, memory, tools, skills, and auxiliary artifacts, creating security risks beyond explicit user prompts. This paper presents DeepTrap, an automated framework for discovering contextual vulnerabilities in OpenClaw. DeepTrap formulates adversarial context manipulation as a black-box trajectory-level optimization problem that balances risk realization, benign-task preservation, and stealth. It combines risk-conditioned evaluation, multi-objective trajectory scoring, reward-guided beam search, and reflection-based deep probing to identify high-value compromised contexts. We construct a 42-case benchmark spanning six vulnerability classes and seven operational scenarios, and evaluate nine target models using attack and utility grading scores. Results show that contextual compromise can induce substantial unsafe behavior while preserving user-facing task completion, demonstrating that final-response evaluation is insufficient. The findings highlight the need for execution-centric security evaluation of agentic AI systems. Our code is released at: https://github.com/ZJUICSR/DeepTrap
Problem

Research questions and friction points this paper is trying to address.

execution contexts
security evaluation
contextual vulnerabilities
agentic AI systems
adversarial context manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

contextual vulnerability
agentic AI security
black-box trajectory optimization
execution context red-teaming
multi-objective adversarial probing
🔎 Similar Papers
No similar papers found.
💼 Related Jobs
Hongwei Yao
Hongwei Yao
Postdoctoral Fellow at City University of Hong Kong
Trustworthy AILLM Security and Safety
Y
Yiming Liu
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Yiling He
Yiling He
Research Fellow @University College London; PhD @Zhejiang University
Software SecurityTrustworthy AICode LLMModel Explainability
B
Bingrun Yang
School of Computer Science and Technology, Zhejiang University, Zhejiang, China