Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the underexplored security risks faced by agent-based language models operating in dynamic execution contexts—such as files, memory, and external tools—where vulnerabilities may arise beyond user prompts and remain undetected by conventional evaluation methods that focus solely on final outputs. To tackle this, the authors propose DeepTrap, a novel framework that formalizes contextual safety threats as a black-box, multi-objective trajectory optimization problem. DeepTrap integrates risk-conditioned assessment, multi-objective scoring, reward-guided beam search, and reflective deep probing to automatically uncover high-impact vulnerable contexts. Evaluated on a benchmark of 42 cases spanning six vulnerability types and seven scenarios across nine mainstream models, the experiments demonstrate that context manipulation can reliably induce unsafe behaviors while preserving task completion capabilities, thereby exposing critical limitations in current safety evaluation paradigms.

📝 Abstract

Agentic language-model systems increasingly rely on mutable execution contexts, including files, memory, tools, skills, and auxiliary artifacts, creating security risks beyond explicit user prompts. This paper presents DeepTrap, an automated framework for discovering contextual vulnerabilities in OpenClaw. DeepTrap formulates adversarial context manipulation as a black-box trajectory-level optimization problem that balances risk realization, benign-task preservation, and stealth. It combines risk-conditioned evaluation, multi-objective trajectory scoring, reward-guided beam search, and reflection-based deep probing to identify high-value compromised contexts. We construct a 42-case benchmark spanning six vulnerability classes and seven operational scenarios, and evaluate nine target models using attack and utility grading scores. Results show that contextual compromise can induce substantial unsafe behavior while preserving user-facing task completion, demonstrating that final-response evaluation is insufficient. The findings highlight the need for execution-centric security evaluation of agentic AI systems. Our code is released at: https://github.com/ZJUICSR/DeepTrap

Problem

Research questions and friction points this paper is trying to address.

execution contexts

security evaluation

contextual vulnerabilities

agentic AI systems

adversarial context manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

contextual vulnerability

agentic AI security

black-box trajectory optimization