Can You Trust What You See? Alpha Channel No-Box Attacks on Video Object Detection

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of visual imperceptibility in box-free adversarial attacks against video object detection. We first identify a previously unrecognized security vulnerability in the alpha channel of RGBA videos under black-box settings. To exploit this, we propose α-Cloak—the first fully visually imperceptible and format-compatible black-box adversarial attack method leveraging the alpha channel. By exploiting alpha compositing, α-Cloak implicitly embeds adversarial perturbations into benign video streams without requiring access to the target model’s architecture or parameters; it relies solely on black-box queries. Extensive experiments demonstrate that α-Cloak achieves 100% attack success rates across five state-of-the-art detectors—including YOLOv8 and RT-DETR—as well as the multimodal large model Gemini-2.0-Flash. Crucially, attacked videos retain original visual fidelity and full playback compatibility. This work establishes a novel attack paradigm and evaluation benchmark for video AI security.

Technology Category

Application Category

📝 Abstract
As object detection models are increasingly deployed in cyber-physical systems such as autonomous vehicles (AVs) and surveillance platforms, ensuring their security against adversarial threats is essential. While prior work has explored adversarial attacks in the image domain, those attacks in the video domain remain largely unexamined, especially in the no-box setting. In this paper, we present α-Cloak, the first no-box adversarial attack on object detectors that operates entirely through the alpha channel of RGBA videos. α-Cloak exploits the alpha channel to fuse a malicious target video with a benign video, resulting in a fused video that appears innocuous to human viewers but consistently fools object detectors. Our attack requires no access to model architecture, parameters, or outputs, and introduces no perceptible artifacts. We systematically study the support for alpha channels across common video formats and playback applications, and design a fusion algorithm that ensures visual stealth and compatibility. We evaluate α-Cloak on five state-of-the-art object detectors, a vision-language model, and a multi-modal large language model (Gemini-2.0-Flash), demonstrating a 100% attack success rate across all scenarios. Our findings reveal a previously unexplored vulnerability in video-based perception systems, highlighting the urgent need for defenses that account for the alpha channel in adversarial settings.
Problem

Research questions and friction points this paper is trying to address.

Developing no-box adversarial attacks on video object detection systems
Exploiting alpha channel vulnerabilities to deceive object detectors invisibly
Addressing security risks in autonomous vehicles and surveillance platforms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Alpha channel manipulation for video attacks
No-box adversarial attack without model access
Fusion algorithm ensures stealth and compatibility
🔎 Similar Papers
No similar papers found.
A
Ariana Yi
Mission San Jose High School
C
Ce Zhou
Missouri University of Science and Technology
L
Liyang Xiao
Michigan State University
Qiben Yan
Qiben Yan
Computer Science and Engineering, Michigan State University
Security and PrivacyCyber-Physical SystemsAI AgentInternet-of-ThingsSmart Contract