🤖 AI Summary
Existing vision-based reinforcement learning (RL) systems exhibit apparent robustness against $l_p$-norm-bounded adversarial perturbations; however, this robustness stems not from semantic resilience but from the inability of such attacks to induce meaningful, semantically coherent changes in observations. We identify a fundamental deficiency in current defenses: their lack of semantic-level robustness. Method: We propose the first diffusion-model-based semantic adversarial attack for RL, leveraging denoising diffusion probabilistic modeling to generate adversarial states that exhibit large semantic shifts while preserving visual fidelity and temporal consistency—thereby transcending the limitations of norm-constrained perturbations. Contribution/Results: Our attack is policy-agnostic (requiring no access to the victim’s policy network), achieves significantly higher success rates against state-of-the-art defenses, and maintains superior visual imperceptibility. It is the first systematic demonstration of RL systems’ vulnerability to semantically grounded perturbations at the perception-policy joint level.
📝 Abstract
Reinforcement learning (RL) systems, while achieving remarkable success across various domains, are vulnerable to adversarial attacks. This is especially a concern in vision-based environments where minor manipulations of high-dimensional image inputs can easily mislead the agent's behavior. To this end, various defenses have been proposed recently, with state-of-the-art approaches achieving robust performance even under large state perturbations. However, after closer investigation, we found that the effectiveness of the current defenses is due to a fundamental weakness of the existing $l_p$ norm-constrained attacks, which can barely alter the semantics of image input even under a relatively large perturbation budget. In this work, we propose SHIFT, a novel policy-agnostic diffusion-based state perturbation attack to go beyond this limitation. Our attack is able to generate perturbed states that are semantically different from the true states while remaining realistic and history-aligned to avoid detection. Evaluations show that our attack effectively breaks existing defenses, including the most sophisticated ones, significantly outperforming existing attacks while being more perceptually stealthy. The results highlight the vulnerability of RL agents to semantics-aware adversarial perturbations, indicating the importance of developing more robust policies.