DETOUR: A Practical Backdoor Attack against Object Detection

📅 2026-04-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

199K/year
🤖 AI Summary
Existing backdoor attacks against detection Transformers rely on small, fixed-location perturbations as triggers, which struggle to generalize under real-world variations in scale, viewpoint, and position, and are highly susceptible to imaging distortions. This work proposes DETOUR, a practical backdoor attack for object detection that exploits the Trigger Radiation Effect (TRE)—a phenomenon wherein coordinated trigger placements across multiple locations amplify malicious activation. By employing semantic triggers derived from real objects and integrating multi-scale resizing, multi-anchor placement, and multi-viewpoint trigger pattern extraction during training, DETOUR enhances the model’s robustness to spatial and perspective transformations. Experimental results demonstrate that DETOUR reliably activates the backdoor across diverse physical conditions, substantially improving both the practicality and effectiveness of the attack.

Technology Category

Application Category

📝 Abstract
Object detection (OD) is critical to real-world vision systems, yet existing backdoor attacks on detection transformers (DETRs) for OD tasks rely on patch-wise triggers optimized at fixed locations with minimal perturbations. Such attacks overlook that backdoor triggers in the real world may appear at different sizes, fields of view (FoVs), and locations in images, while minimal perturbations are difficult for cameras to capture, limiting attack practicality. We first observe that a patch-wise trigger in DETR delivers high attack effectiveness when activating the backdoor across neighboring locations, a phenomenon we term the trigger radiating effect (TRE). Meanwhile, inserting patch-wise triggers across multiple locations synergistically enhances TRE, resulting in high attack effectiveness across images. We propose DETOUR, a practical backdoor attack by using semantic triggers that are effective in real-world object detection systems. To ensure attack practicality, we rescale trigger patterns to different sizes and insert them at various predefined locations during backdoor training, enabling the model to recognize the trigger regardless of its spatial configurations. To address FoV variations in physical deployments, we extract the trigger pattern from a real-world object (e.g., a mug) captured under multiple FoVs and inject the trigger accordingly, promoting viewpoint-invariant backdoor activation and enhancing TRE across the entire image. As a result, the backdoor can be reliably activated under diverse FoVs and spatial configurations.
Problem

Research questions and friction points this paper is trying to address.

backdoor attack
object detection
trigger robustness
field of view
practicality
Innovation

Methods, ideas, or system contributions that make the work stand out.

backdoor attack
object detection
trigger radiating effect
viewpoint-invariant
semantic trigger