UNIDOOR: A Universal Framework for Action-Level Backdoor Attacks in Deep Reinforcement Learning

📅 2025-01-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Deep reinforcement learning (DRL) systems deployed in safety-critical applications face emerging threats from action-level backdoor attacks, yet existing methods suffer from low attack success in continuous action spaces—due to infrequent target actions—as well as reliance on expert priors and exhaustive grid search. Method: This paper proposes the first general-purpose action-level backdoor attack framework for DRL. It introduces an adaptive reward function optimization mechanism that jointly models action-space perturbations and incorporates online performance feedback, enabling fully automated, human-in-the-loop-free backdoor policy discovery. Contribution/Results: The framework is the first to support heterogeneous settings—including single/multi-agent, discrete/continuous action spaces, and sparse/dense reward environments. Extensive experiments demonstrate significantly improved attack success rates and stealthiness across diverse DRL benchmarks. Neuron activation analysis and state-distribution visualization further confirm its high concealment. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Deep reinforcement learning (DRL) is widely applied to safety-critical decision-making scenarios. However, DRL is vulnerable to backdoor attacks, especially action-level backdoors, which pose significant threats through precise manipulation and flexible activation, risking outcomes like vehicle collisions or drone crashes. The key distinction of action-level backdoors lies in the utilization of the backdoor reward function to associate triggers with target actions. Nevertheless, existing studies typically rely on backdoor reward functions with fixed values or conditional flipping, which lack universality across diverse DRL tasks and backdoor designs, resulting in fluctuations or even failure in practice. This paper proposes the first universal action-level backdoor attack framework, called UNIDOOR, which enables adaptive exploration of backdoor reward functions through performance monitoring, eliminating the reliance on expert knowledge and grid search. We highlight that action tampering serves as a crucial component of action-level backdoor attacks in continuous action scenarios, as it addresses attack failures caused by low-frequency target actions. Extensive evaluations demonstrate that UNIDOOR significantly enhances the attack performance of action-level backdoors, showcasing its universality across diverse attack scenarios, including single/multiple agents, single/multiple backdoors, discrete/continuous action spaces, and sparse/dense reward signals. Furthermore, visualization results encompassing state distribution, neuron activation, and animations demonstrate the stealthiness of UNIDOOR. The source code of UNIDOOR can be found at https://github.com/maoubo/UNIDOOR.

Problem

Research questions and friction points this paper is trying to address.

Backdoor Attacks

Deep Reinforcement Learning

Continuous Actions

Innovation

Methods, ideas, or system contributions that make the work stand out.

UNIDOOR

Adaptive Reward Adjustment

Stealthy Backdoor Attacks

🔎 Similar Papers

Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning

2024-05-23arXiv.orgCitations: 0

Authors to Follow