DREAM: Dynamic Red-teaming across Environments for AI Models

📅 2025-12-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing safety evaluations of LLM-based agents rely on static, single-turn tests and thus fail to uncover vulnerabilities exploitable via multi-stage, long-horizon adversarial sequences. Method: This paper proposes the first cross-environment, multi-stage red-teaming framework. It introduces a Cross-Environment Adversarial Knowledge Graph (CE-AKG) and a Contextualized Guided Policy Search (C-GPS) algorithm, integrated with a multi-environment atomic action library (349 environments, 1986 actions) and a dynamic context-tracking mechanism to enable state-aware, environment-adaptive attack chain generation. Contribution/Results: Evaluated on 12 mainstream LLM agents, the framework achieves over 70% success rate in generating dynamic attack chains. It systematically exposes two fundamental weaknesses: (1) contextual fragility—failure to maintain consistent security constraints across evolving dialog states—and (2) breakdown in long-term malicious intent tracking—loss of adversarial goal awareness over extended interaction horizons.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are increasingly used in agentic systems, where their interactions with diverse tools and environments create complex, multi-stage safety challenges. However, existing benchmarks mostly rely on static, single-turn assessments that miss vulnerabilities from adaptive, long-chain attacks. To fill this gap, we introduce DREAM, a framework for systematic evaluation of LLM agents against dynamic, multi-stage attacks. At its core, DREAM uses a Cross-Environment Adversarial Knowledge Graph (CE-AKG) to maintain stateful, cross-domain understanding of vulnerabilities. This graph guides a Contextualized Guided Policy Search (C-GPS) algorithm that dynamically constructs attack chains from a knowledge base of 1,986 atomic actions across 349 distinct digital environments. Our evaluation of 12 leading LLM agents reveals a critical vulnerability: these attack chains succeed in over 70% of cases for most models, showing the power of stateful, cross-environment exploits. Through analysis of these failures, we identify two key weaknesses in current agents: contextual fragility, where safety behaviors fail to transfer across environments, and an inability to track long-term malicious intent. Our findings also show that traditional safety measures, such as initial defense prompts, are largely ineffective against attacks that build context over multiple interactions. To advance agent safety research, we release DREAM as a tool for evaluating vulnerabilities and developing more robust defenses.
Problem

Research questions and friction points this paper is trying to address.

Evaluates LLM agents against dynamic multi-stage attacks across environments
Identifies vulnerabilities from adaptive long-chain attacks missed by static benchmarks
Addresses contextual fragility and inability to track long-term malicious intent
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic multi-stage attack evaluation framework
Cross-Environment Adversarial Knowledge Graph for vulnerability tracking
Contextualized Guided Policy Search for adaptive attack generation
🔎 Similar Papers
No similar papers found.
L
Liming Lu
Nanjing University of Science and Technology
Xiang Gu
Xiang Gu
Xi'an Jiaotong University
transfer learningoptimal transportgenerative models
J
Junyu Huang
Nanjing University of Science and Technology
Jiawei Du
Jiawei Du
National Taiwan University; ex-Intern @ Samsung Research
Speech processingNeural codingGenerative AIAI security
Y
Yunhuai Liu
Peking University
Y
Yongbin Zhou
Nanjing University of Science and Technology
Shuchao Pang
Shuchao Pang
University of New South Wales
Medical image analysisdeep learning