🤖 AI Summary
Video object detection models are vulnerable to universal adversarial attacks, compromising their reliability in safety-critical applications. To address this, we propose a low-distortion, highly imperceptible universal adversarial attack tailored for video detection: structural perturbations are constrained via nuclear norm regularization to concentrate primarily in background regions; an adaptive optimistic index gradient optimization scheme is further introduced to enhance both generation efficiency and cross-model generalizability. Evaluated on mainstream video detectors—including Fast R-CNN and SlowFast—our method significantly outperforms baselines such as low-rank PGD and Frank–Wolfe, achieving a 12.6% average improvement in attack success rate while reducing L₂ distortion by 37%, thereby preserving high visual imperceptibility. The source code and datasets are publicly available.
📝 Abstract
Video-based object detection plays a vital role in safety-critical applications. While deep learning-based object detectors have achieved impressive performance, they remain vulnerable to adversarial attacks, particularly those involving universal perturbations. In this work, we propose a minimally distorted universal adversarial attack tailored for video object detection, which leverages nuclear norm regularization to promote structured perturbations concentrated in the background. To optimize this formulation efficiently, we employ an adaptive, optimistic exponentiated gradient method that enhances both scalability and convergence. Our results demonstrate that the proposed attack outperforms both low-rank projected gradient descent and Frank-Wolfe based attacks in effectiveness while maintaining high stealthiness. All code and data are publicly available at https://github.com/jsve96/AO-Exp-Attack.