Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation

📅 2024-06-06

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This paper addresses behavioral goal adversarial attacks in reinforcement learning—where adversaries perturb state observations to manipulate agent behavior—a critical safety challenge. Methodologically: (1) we propose an environment-agnostic, black-box-friendly imitation-learning-based attack generator, eliminating reliance on white-box policy access; (2) we introduce time-discounted ℓ₂ regularization to enhance robustness of early trajectory segments; and (3) we integrate adversarial-demonstration-driven imitation learning with policy sensitivity analysis for robust policy training. Evaluated across multiple RL benchmark environments, our framework improves defense success rates by 32%–57% against behavioral attacks while preserving 100% of original task performance—achieving, for the first time, simultaneous high robustness and zero performance degradation.

Technology Category

Application Category

📝 Abstract

This study investigates behavior-targeted attacks on reinforcement learning and their countermeasures. Behavior-targeted attacks aim to manipulate the victim's behavior as desired by the adversary through adversarial interventions in state observations. Existing behavior-targeted attacks have some limitations, such as requiring white-box access to the victim's policy. To address this, we propose a novel attack method using imitation learning from adversarial demonstrations, which works under limited access to the victim's policy and is environment-agnostic. In addition, our theoretical analysis proves that the policy's sensitivity to state changes impacts defense performance, particularly in the early stages of the trajectory. Based on this insight, we propose time-discounted regularization, which enhances robustness against attacks while maintaining task performance. To the best of our knowledge, this is the first defense strategy specifically designed for behavior-targeted attacks.

Problem

Research questions and friction points this paper is trying to address.

Investigates behavior-targeted attacks on reinforcement learning systems

Proposes imitation-based attack method with limited policy access

Develops time-discounted regularization defense against behavior manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Imitation learning from adversarial demonstrations

Time-discounted regularization for robustness

Environment-agnostic attack method

🔎 Similar Papers

No similar papers found.