Can You Trust What You See? Alpha Channel No-Box Attacks on Video Object Detection

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

This work addresses the challenge of visual imperceptibility in box-free adversarial attacks against video object detection. We first identify a previously unrecognized security vulnerability in the alpha channel of RGBA videos under black-box settings. To exploit this, we propose α-Cloak—the first fully visually imperceptible and format-compatible black-box adversarial attack method leveraging the alpha channel. By exploiting alpha compositing, α-Cloak implicitly embeds adversarial perturbations into benign video streams without requiring access to the target model’s architecture or parameters; it relies solely on black-box queries. Extensive experiments demonstrate that α-Cloak achieves 100% attack success rates across five state-of-the-art detectors—including YOLOv8 and RT-DETR—as well as the multimodal large model Gemini-2.0-Flash. Crucially, attacked videos retain original visual fidelity and full playback compatibility. This work establishes a novel attack paradigm and evaluation benchmark for video AI security.

Technology Category

Application Category

📝 Abstract

As object detection models are increasingly deployed in cyber-physical systems such as autonomous vehicles (AVs) and surveillance platforms, ensuring their security against adversarial threats is essential. While prior work has explored adversarial attacks in the image domain, those attacks in the video domain remain largely unexamined, especially in the no-box setting. In this paper, we present α-Cloak, the first no-box adversarial attack on object detectors that operates entirely through the alpha channel of RGBA videos. α-Cloak exploits the alpha channel to fuse a malicious target video with a benign video, resulting in a fused video that appears innocuous to human viewers but consistently fools object detectors. Our attack requires no access to model architecture, parameters, or outputs, and introduces no perceptible artifacts. We systematically study the support for alpha channels across common video formats and playback applications, and design a fusion algorithm that ensures visual stealth and compatibility. We evaluate α-Cloak on five state-of-the-art object detectors, a vision-language model, and a multi-modal large language model (Gemini-2.0-Flash), demonstrating a 100% attack success rate across all scenarios. Our findings reveal a previously unexplored vulnerability in video-based perception systems, highlighting the urgent need for defenses that account for the alpha channel in adversarial settings.

Problem

Research questions and friction points this paper is trying to address.

Developing no-box adversarial attacks on video object detection systems

Exploiting alpha channel vulnerabilities to deceive object detectors invisibly

Addressing security risks in autonomous vehicles and surveillance platforms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Alpha channel manipulation for video attacks

No-box adversarial attack without model access

Fusion algorithm ensures stealth and compatibility

🔎 Similar Papers

A Survey and Evaluation of Adversarial Attacks for Object Detection

2024-08-04arXiv.orgCitations: 1

Apple

Cupertino, United States of America

Applied Computer Vision Engineer - Data Driven Development

Bosch Group

bengaluru, IN

AI Research Scientist, Computer Vision - Facebook Video Intelligence