TrojanTO: Action-Level Backdoor Attacks against Trajectory Optimization Models

📅 2025-06-15

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Trajectory optimization (TO) models achieve strong performance in offline reinforcement learning, yet their robustness against backdoor attacks remains unexplored. Existing reward-manipulation-based backdoor attacks fail against TO models due to their inherent sequential modeling structure, while the high-dimensional action space renders action-level attacks particularly challenging. This paper proposes the first action-level backdoor attack tailored to TO models: it directly embeds a stealthy trigger-to-target action mapping in the action space. The attack employs alternating training to strengthen the trigger–action association, combined with trajectory filtering and batch-wise poisoning to enhance both stealthiness and consistency. Experiments demonstrate that, under a stringent 0.3% trajectory poisoning budget, the method achieves high attack success rates across diverse TO architectures—including Decision Transformer (DT), Goal-Conditioned DT (GDT), and Diffusion-Control (DC)—while preserving near-original performance on clean tasks.

Technology Category

Application Category

📝 Abstract

Recent advances in Trajectory Optimization (TO) models have achieved remarkable success in offline reinforcement learning. However, their vulnerabilities against backdoor attacks are poorly understood. We find that existing backdoor attacks in reinforcement learning are based on reward manipulation, which are largely ineffective against the TO model due to its inherent sequence modeling nature. Moreover, the complexities introduced by high-dimensional action spaces further compound the challenge of action manipulation. To address these gaps, we propose TrojanTO, the first action-level backdoor attack against TO models. TrojanTO employs alternating training to enhance the connection between triggers and target actions for attack effectiveness. To improve attack stealth, it utilizes precise poisoning via trajectory filtering for normal performance and batch poisoning for trigger consistency. Extensive evaluations demonstrate that TrojanTO effectively implants backdoor attacks across diverse tasks and attack objectives with a low attack budget (0.3% of trajectories). Furthermore, TrojanTO exhibits broad applicability to DT, GDT, and DC, underscoring its scalability across diverse TO model architectures.

Problem

Research questions and friction points this paper is trying to address.

Study vulnerabilities of Trajectory Optimization models to backdoor attacks

Develop first action-level backdoor attack method for TO models

Ensure attack effectiveness and stealth across diverse TO architectures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Alternating training enhances trigger-action connection

Precise poisoning ensures stealth and normal performance

Low-cost attack effective across diverse TO models

🔎 Similar Papers

No similar papers found.

Authors to Follow