🤖 AI Summary
Existing black-box adversarial attacks on video classification models neglect the intrinsic multi-dimensional structure of videos, incur excessive query costs, and produce perceptible perturbations. To address these limitations, this paper proposes the first black-box attack framework based on fourth-order tensor modeling and low-rank perturbation optimization. Methodologically, the input video is explicitly represented as a spatiotemporal-channel fourth-order tensor, and a low-rank constraint is imposed on the perturbation space; gradient estimation is then integrated with tensor decomposition to enable efficient black-box optimization. This design drastically reduces the search space, ensuring imperceptible perturbations while substantially decreasing query complexity. Extensive experiments on standard video benchmarks demonstrate that our method achieves higher attack success rates and significantly improved query efficiency—outperforming all current state-of-the-art black-box video attack approaches.
📝 Abstract
Deep learning models have achieved remarkable success in computer vision but remain vulnerable to adversarial attacks, particularly in black-box settings where model details are unknown. Existing adversarial attack methods(even those works with key frames) often treat video data as simple vectors, ignoring their inherent multi-dimensional structure, and require a large number of queries, making them inefficient and detectable. In this paper, we propose extbf{TenAd}, a novel tensor-based low-rank adversarial attack that leverages the multi-dimensional properties of video data by representing videos as fourth-order tensors. By exploiting low-rank attack, our method significantly reduces the search space and the number of queries needed to generate adversarial examples in black-box settings. Experimental results on standard video classification datasets demonstrate that extbf{TenAd} effectively generates imperceptible adversarial perturbations while achieving higher attack success rates and query efficiency compared to state-of-the-art methods. Our approach outperforms existing black-box adversarial attacks in terms of success rate, query efficiency, and perturbation imperceptibility, highlighting the potential of tensor-based methods for adversarial attacks on video models.