Diffusion Guided Adversarial State Perturbations in Reinforcement Learning

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Existing vision-based reinforcement learning (RL) systems exhibit apparent robustness against $l_p$-norm-bounded adversarial perturbations; however, this robustness stems not from semantic resilience but from the inability of such attacks to induce meaningful, semantically coherent changes in observations. We identify a fundamental deficiency in current defenses: their lack of semantic-level robustness. Method: We propose the first diffusion-model-based semantic adversarial attack for RL, leveraging denoising diffusion probabilistic modeling to generate adversarial states that exhibit large semantic shifts while preserving visual fidelity and temporal consistency—thereby transcending the limitations of norm-constrained perturbations. Contribution/Results: Our attack is policy-agnostic (requiring no access to the victim’s policy network), achieves significantly higher success rates against state-of-the-art defenses, and maintains superior visual imperceptibility. It is the first systematic demonstration of RL systems’ vulnerability to semantically grounded perturbations at the perception-policy joint level.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) systems, while achieving remarkable success across various domains, are vulnerable to adversarial attacks. This is especially a concern in vision-based environments where minor manipulations of high-dimensional image inputs can easily mislead the agent's behavior. To this end, various defenses have been proposed recently, with state-of-the-art approaches achieving robust performance even under large state perturbations. However, after closer investigation, we found that the effectiveness of the current defenses is due to a fundamental weakness of the existing $l_p$ norm-constrained attacks, which can barely alter the semantics of image input even under a relatively large perturbation budget. In this work, we propose SHIFT, a novel policy-agnostic diffusion-based state perturbation attack to go beyond this limitation. Our attack is able to generate perturbed states that are semantically different from the true states while remaining realistic and history-aligned to avoid detection. Evaluations show that our attack effectively breaks existing defenses, including the most sophisticated ones, significantly outperforming existing attacks while being more perceptually stealthy. The results highlight the vulnerability of RL agents to semantics-aware adversarial perturbations, indicating the importance of developing more robust policies.

Problem

Research questions and friction points this paper is trying to address.

RL systems are vulnerable to adversarial attacks in vision-based environments

Existing defenses fail against semantically-altered image input perturbations

Current attacks cannot realistically alter input semantics under perturbation constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based state perturbation attack for RL

Generates semantically altered yet realistic states

Breaks existing defenses with perceptual stealthiness

🔎 Similar Papers

No similar papers found.