🤖 AI Summary
This work presents the first systematic study of backdoor attacks against six-degree-of-freedom (6DoF) object pose estimation models. Addressing the limitation of conventional 2D classification backdoors—which cannot manipulate continuous geometric parameters such as translation and rotation—we propose a novel 3D-trigger-based attack framework. Our method employs end-to-end differentiable optimization to model physically realizable 3D triggers and integrates seamlessly with state-of-the-art pose estimators, including PVNet, DenseFusion, and PoseDiffusion. The attack achieves 100% attack success rate (ASR) while preserving clean-sample performance (ADD accuracy = 100%) and attaining 97.70% ADD-S on triggered samples. Moreover, it effectively evades representative defensive mechanisms. This work bridges a critical gap in the security analysis of 6DoF pose estimation and establishes a new paradigm for evaluating robustness in 3D vision models.
📝 Abstract
Deep learning advances have enabled accurate six-degree-of-freedom (6DoF) object pose estimation, widely used in robotics, AR/VR, and autonomous systems. However, backdoor attacks pose significant security risks. While most research focuses on 2D vision, 6DoF pose estimation remains largely unexplored. Unlike traditional backdoors that only change classes, 6DoF attacks must control continuous parameters like translation and rotation, rendering 2D methods inapplicable. We propose 6DAttack, a framework using 3D object triggers to induce controlled erroneous poses while maintaining normal behavior. Evaluations on PVNet, DenseFusion, and PoseDiffusion across LINEMOD, YCB-Video, and CO3D show high attack success rates (ASRs) without compromising clean performance. Backdoored models achieve up to 100% clean ADD accuracy and 100% ASR, with triggered samples reaching 97.70% ADD-P. Furthermore, a representative defense remains ineffective. Our findings reveal a serious, underexplored threat to 6DoF pose estimation.