🤖 AI Summary
This study systematically investigates the impact of plasticity-based interventions on backdoor attacks in deep reinforcement learning (DRL), addressing a critical gap in empirical research. Through large-scale evaluation of 14,664 intervention–attack combinations, the work introduces the “Intervention-Backdoor Coupling” (SCC) conceptual framework, revealing the dual role of plasticity interventions: while SAM exacerbates threats by amplifying backdoor gradients, most other interventions effectively mitigate attacks by disrupting activation pathways or compressing representation spaces. Furthermore, the research identifies abnormally sharp loss landscapes as a key indicator for backdoor detection, offering both theoretical insights and practical guidance for designing secure DRL systems.
📝 Abstract
Extensive research has highlighted the severe threats posed by backdoor attacks to deep reinforcement learning (DRL). However, prior studies primarily focus on vanilla scenarios, while plasticity interventions have emerged as indispensable built-in components of modern DRL agents. Despite their effectiveness in mitigating plasticity loss, the impact of these interventions on DRL backdoor vulnerabilities remains underexplored, and this lack of systematic investigation poses risks in practical DRL deployments. To bridge this gap, we empirically study 14,664 cases integrating representative interventions and attack scenarios. We find that only one intervention (i.e., SAM) exacerbates backdoor threats, while other interventions mitigate them. Pathological analysis identifies that the exacerbation is attributed to backdoor gradient amplification, while the mitigation stems from activation pathway disruption and representation space compression. From these findings, we derive two novel insights: (1) a conceptual framework SCC for robust backdoor injection that deconstructs the mechanistic interplay between interventions and backdoors in DRL, and (2) abnormal loss landscape sharpness as a key indicator for DRL backdoor detection.