TrojanTO: Action-Level Backdoor Attacks against Trajectory Optimization Models

📅 2025-06-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Trajectory optimization (TO) models achieve strong performance in offline reinforcement learning, yet their robustness against backdoor attacks remains unexplored. Existing reward-manipulation-based backdoor attacks fail against TO models due to their inherent sequential modeling structure, while the high-dimensional action space renders action-level attacks particularly challenging. This paper proposes the first action-level backdoor attack tailored to TO models: it directly embeds a stealthy trigger-to-target action mapping in the action space. The attack employs alternating training to strengthen the trigger–action association, combined with trajectory filtering and batch-wise poisoning to enhance both stealthiness and consistency. Experiments demonstrate that, under a stringent 0.3% trajectory poisoning budget, the method achieves high attack success rates across diverse TO architectures—including Decision Transformer (DT), Goal-Conditioned DT (GDT), and Diffusion-Control (DC)—while preserving near-original performance on clean tasks.

Technology Category

Application Category

📝 Abstract
Recent advances in Trajectory Optimization (TO) models have achieved remarkable success in offline reinforcement learning. However, their vulnerabilities against backdoor attacks are poorly understood. We find that existing backdoor attacks in reinforcement learning are based on reward manipulation, which are largely ineffective against the TO model due to its inherent sequence modeling nature. Moreover, the complexities introduced by high-dimensional action spaces further compound the challenge of action manipulation. To address these gaps, we propose TrojanTO, the first action-level backdoor attack against TO models. TrojanTO employs alternating training to enhance the connection between triggers and target actions for attack effectiveness. To improve attack stealth, it utilizes precise poisoning via trajectory filtering for normal performance and batch poisoning for trigger consistency. Extensive evaluations demonstrate that TrojanTO effectively implants backdoor attacks across diverse tasks and attack objectives with a low attack budget (0.3% of trajectories). Furthermore, TrojanTO exhibits broad applicability to DT, GDT, and DC, underscoring its scalability across diverse TO model architectures.
Problem

Research questions and friction points this paper is trying to address.

Study vulnerabilities of Trajectory Optimization models to backdoor attacks
Develop first action-level backdoor attack method for TO models
Ensure attack effectiveness and stealth across diverse TO architectures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Alternating training enhances trigger-action connection
Precise poisoning ensures stealth and normal performance
Low-cost attack effective across diverse TO models
🔎 Similar Papers
No similar papers found.
Yang Dai
Yang Dai
Shenzhen Institutes of Advanced Technology,Chinese Academy of Sciences
perovskitesmemristor
O
Oubo Ma
Zhejiang University
L
Longfei Zhang
Laboratory for Big Data and Decision, National University of Defense Technology
X
Xingxing Liang
Laboratory for Big Data and Decision, National University of Defense Technology
Xiaochun Cao
Xiaochun Cao
Sun Yat-sen University
Computer VisionArtificial IntelligenceMultimediaMachine Learning
Shouling Ji
Shouling Ji
Professor, Zhejiang University & Georgia Institute of Technology
Data-driven SecurityAI SecuritySoftware ScurityPrivacy
Jiaheng Zhang
Jiaheng Zhang
Assistant Professor, National University of Singapore.
Zero-knowledge proofsAI safetyApplied cryptographyBlockchain
J
Jincai Huang
Laboratory for Big Data and Decision, National University of Defense Technology
L
Li Shen
Shenzhen Campus of Sun Yat-sen University