๐ค AI Summary
Existing adversarial patch methods for visual-infrared dense prediction tasks suffer from weak attack efficacy and poor stealthiness due to their neglect of cross-spectral inconsistencies. This work proposes AP-PCO, a novel framework that introduces, for the first time, a joint position-color optimization mechanism to simultaneously perturb both visual and infrared modalities under a black-box setting without requiring internal model information. The method leverages a model-output-driven fitness function for optimization and incorporates a cross-modal color adaptation strategy to effectively reduce patch saliency in both spectral domains. Extensive experiments demonstrate that AP-PCO achieves highly effective, stealthy, and generalizable attacks across various visual-infrared dense prediction models, establishing a new benchmark for evaluating the robustness of multimodal perception systems.
๐ Abstract
Multimodal adversarial attacks for dense prediction remain largely underexplored. In particular, visual-infrared (VI) perception systems introduce unique challenges due to heterogeneous spectral characteristics and modality-specific intensity distributions. Existing adversarial patch methods are primarily designed for single-modal inputs and fail to account for crossspectral inconsistencies, leading to reduced attack effectiveness and poor stealthiness when applied to VI dense prediction models. To address these challenges, we propose a joint position-color optimization framework (AP-PCO) for generating adversarial patches in visual-infrared settings. The proposed method optimizes patch placement and color composition simultaneously using a fitness function derived from model outputs, enabling a single patch to perturb both visible and infrared modalities. To further bridge spectral discrepancies, we introduce a crossmodal color adaptation strategy that constrains patch appearance according to infrared grayscale characteristics while maintaining strong perturbations in the visible domain, thereby reducing cross-spectral saliency. The optimization procedure operates without requiring internal model information, supporting flexible black-box attacks. Extensive experiments on visual-infrared dense prediction tasks demonstrate that the proposed AP-PCO achieves consistently strong attack performance across multiple architectures, providing a practical benchmark for robustness evaluation in VI perception systems.