Protego: Detecting Adversarial Examples for Vision Transformers via Intrinsic Capabilities

📅 2024-08-12
🏛️ 2024 IEEE International Conference on Metaverse Computing, Networking, and Applications (MetaCom)
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision Transformers (ViTs) exhibit significant vulnerability to adversarial attacks in emerging applications such as the metaverse, posing critical security risks. Method: This paper proposes Protego, an adversarial sample detection framework grounded in ViT’s intrinsic attention mechanism. We systematically characterize token-level attention shifts under six mainstream adversarial attacks and innovatively formulate attention propagation jointly with gradient-based attention propagation as a decision-bias indicator—departing from conventional paradigms reliant on feature-distribution modeling or reconstruction error. Using attention rollout, gradient attention rollout, and token representation modeling, we train a lightweight binary classifier for detection. Results: Protego achieves AUC scores exceeding 0.95 across all six attack types, substantially outperforming state-of-the-art methods. It provides a practical, deployable defense baseline for robust ViT operation in open-world environments.

Technology Category

Application Category

📝 Abstract
Transformer models have excelled in natural language tasks, prompting the vision community to explore their implementation in computer vision problems. However, these models are still influenced by adversarial examples. In this paper, we investigate the attack capabilities of six common adversarial attacks on three pre-trained ViT models to reveal the vulnerability of ViT models. To understand and analyse the bias in neural network decisions when the input is adversarial, we use two visualisation techniques that are attention rollout and grad attention rollout. To prevent ViT models from adversarial attack, we propose Protego, a detection framework that leverages the transformer intrinsic capabilities to detection adversarial examples of ViT models. Nonetheless, this is challenging due to a diversity of attack strategies that may be adopted by adversaries. Inspired by the attention mechanism, we know that the token of model’s prediction contains all the information from the input sample. Additionally, the attention region for adversarial examples differs from that of normal examples. Given these points, we can train a detector that achieves superior performance than existing detection methods to identify adversarial examples. Our experiments have demonstrated the high effectiveness of our detection method. For these six adversarial attack methods, our detector’s AUC scores all exceed 0.95. Protego may advance investigations in metaverse security.
Problem

Research questions and friction points this paper is trying to address.

Adversarial Examples
Image Recognition
Metaverse Security
Innovation

Methods, ideas, or system contributions that make the work stand out.

Protego
Transformer Model
Adversarial Image Defense
🔎 Similar Papers
No similar papers found.