🤖 AI Summary
Trajectory optimization (TO) models achieve strong performance in offline reinforcement learning, yet their robustness against backdoor attacks remains unexplored. Existing reward-manipulation-based backdoor attacks fail against TO models due to their inherent sequential modeling structure, while the high-dimensional action space renders action-level attacks particularly challenging. This paper proposes the first action-level backdoor attack tailored to TO models: it directly embeds a stealthy trigger-to-target action mapping in the action space. The attack employs alternating training to strengthen the trigger–action association, combined with trajectory filtering and batch-wise poisoning to enhance both stealthiness and consistency. Experiments demonstrate that, under a stringent 0.3% trajectory poisoning budget, the method achieves high attack success rates across diverse TO architectures—including Decision Transformer (DT), Goal-Conditioned DT (GDT), and Diffusion-Control (DC)—while preserving near-original performance on clean tasks.
📝 Abstract
Recent advances in Trajectory Optimization (TO) models have achieved remarkable success in offline reinforcement learning. However, their vulnerabilities against backdoor attacks are poorly understood. We find that existing backdoor attacks in reinforcement learning are based on reward manipulation, which are largely ineffective against the TO model due to its inherent sequence modeling nature. Moreover, the complexities introduced by high-dimensional action spaces further compound the challenge of action manipulation. To address these gaps, we propose TrojanTO, the first action-level backdoor attack against TO models. TrojanTO employs alternating training to enhance the connection between triggers and target actions for attack effectiveness. To improve attack stealth, it utilizes precise poisoning via trajectory filtering for normal performance and batch poisoning for trigger consistency. Extensive evaluations demonstrate that TrojanTO effectively implants backdoor attacks across diverse tasks and attack objectives with a low attack budget (0.3% of trajectories). Furthermore, TrojanTO exhibits broad applicability to DT, GDT, and DC, underscoring its scalability across diverse TO model architectures.