🤖 AI Summary
Visual-aware recommendation systems (VARS) are vulnerable to imperceptible adversarial image attacks, posing critical security risks. Method: This paper proposes a unified defense framework integrating adversarial reconstruction and detection. It introduces, for the first time, a global Vision Transformer (ViT)-driven image reconstruction module jointly trained with a contrastive learning–based adversarial detection module in an end-to-end manner—enabling both filtering and discriminative capabilities. The framework is agnostic to attack types (e.g., FGSM, PGD) and VARS architectures. Contribution/Results: Extensive experiments on two real-world datasets demonstrate that the framework significantly enhances recommendation robustness under adversarial perturbations and achieves high-accuracy adversarial sample detection (AUC > 0.96), outperforming state-of-the-art defense methods across all evaluated metrics.
📝 Abstract
With rich visual data, such as images, becoming readily associated with items, visually-aware recommendation systems (VARS) have been widely used in different applications. Recent studies have shown that VARS are vulnerable to item-image adversarial attacks, which add human-imperceptible perturbations to the clean images associated with those items. Attacks on VARS pose new security challenges to a wide range of applications such as e-Commerce and social networks where VARS are widely used. How to secure VARS from such adversarial attacks becomes a critical problem. Currently, there is still a lack of systematic study on how to design secure defense strategies against visual attacks on VARS. In this paper, we attempt to fill this gap by proposing an adversarial image reconstruction and detection framework to secure VARS. Our proposed method can simultaneously (1) secure VARS from adversarial attacks characterized by local perturbations by image reconstruction based on global vision transformers; and (2) accurately detect adversarial examples using a novel contrastive learning approach. Meanwhile, our framework is designed to be used as both a filter and a detector so that they can be jointly trained to improve the flexibility of our defense strategy to a variety of attacks and VARS models. We have conducted extensive experimental studies with two popular attack methods (FGSM and PGD). Our experimental results on two real-world datasets show that our defense strategy against visual attacks is effective and outperforms existing methods on different attacks. Moreover, our method can detect adversarial examples with high accuracy.