Improving Generalizability and Undetectability for Targeted Adversarial Attacks on Multimodal Pre-trained Models

📅 2025-09-24

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Existing target-oriented adversarial attacks against multimodal pre-trained models (e.g., ImageBind) suffer from poor generalizability—failing to transfer effectively to semantically similar targets—and low stealthiness—being readily detectable by anomaly detectors. To address these limitations, we propose Proxy-Directed Attack (PDA), a theoretically grounded optimization framework that leverages diverse, multi-source, and multi-target modality proxies to jointly enhance both cross-modal alignment–driven target generalization and evasion of anomaly detection. PDA integrates three key components: multimodal proxy guidance, robust adversarial optimization, and anomaly-detection–aware defense mitigation. Extensive experiments demonstrate that PDA achieves an average 27.6% improvement in attack success rate across semantically related targets and evades state-of-the-art anomaly detectors—including LID and Mahalanobis—with over 92% success, significantly advancing both generalizability and stealthiness of multimodal adversarial attacks.

Technology Category

Application Category

📝 Abstract

Multimodal pre-trained models (e.g., ImageBind), which align distinct data modalities into a shared embedding space, have shown remarkable success across downstream tasks. However, their increasing adoption raises serious security concerns, especially regarding targeted adversarial attacks. In this paper, we show that existing targeted adversarial attacks on multimodal pre-trained models still have limitations in two aspects: generalizability and undetectability. Specifically, the crafted targeted adversarial examples (AEs) exhibit limited generalization to partially known or semantically similar targets in cross-modal alignment tasks (i.e., limited generalizability) and can be easily detected by simple anomaly detection methods (i.e., limited undetectability). To address these limitations, we propose a novel method called Proxy Targeted Attack (PTA), which leverages multiple source-modal and target-modal proxies to optimize targeted AEs, ensuring they remain evasive to defenses while aligning with multiple potential targets. We also provide theoretical analyses to highlight the relationship between generalizability and undetectability and to ensure optimal generalizability while meeting the specified requirements for undetectability. Furthermore, experimental results demonstrate that our PTA can achieve a high success rate across various related targets and remain undetectable against multiple anomaly detection methods.

Problem

Research questions and friction points this paper is trying to address.

Targeted adversarial attacks lack generalizability to similar targets

Existing attacks have limited undetectability against anomaly detection

Multimodal pre-trained models face security vulnerabilities in cross-modal alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses source-modal and target-modal proxies optimization

Ensures adversarial examples evade detection methods

Aligns examples with multiple potential semantic targets

🔎 Similar Papers

No similar papers found.