Adversarial Illusions in Multi-Modal Embeddings

πŸ“… 2023-08-22
πŸ›οΈ USENIX Security Symposium
πŸ“ˆ Citations: 15
✨ Influential: 1
πŸ“„ PDF
πŸ€– AI Summary
This work uncovers cross-modal adversarial hallucination vulnerabilities in multimodal embeddings (e.g., ImageBind, AudioCLIP): adversaries can induce arbitrarily specified out-of-domain targets (e.g., text, audio, or images) by applying imperceptible perturbations to input images or audioβ€”without requiring knowledge of downstream tasks. To address this, we propose the first task-agnostic, target-controllable, cross-modal adversarial hallucination attack paradigm, integrating gradient-driven perturbation generation with query-based black-box transfer. Our method enables the first black-box adversarial alignment attack against proprietary commercial models (e.g., Amazon Titan). Experiments demonstrate that the attack successfully misleads image/text generation, zero-shot classification, and audio retrieval, while exhibiting strong cross-model transferability across diverse embedding architectures. We further conduct a systematic evaluation of defense efficacy and evasion strategies, revealing fundamental limitations in current robustness mechanisms for multimodal alignment.
πŸ“ Abstract
Multi-modal embeddings encode texts, images, thermal images, sounds, and videos into a single embedding space, aligning representations across different modalities (e.g., associate an image of a dog with a barking sound). In this paper, we show that multi-modal embeddings can be vulnerable to an attack we call"adversarial illusions."Given an image or a sound, an adversary can perturb it to make its embedding close to an arbitrary, adversary-chosen input in another modality. These attacks are cross-modal and targeted: the adversary can align any image or sound with any target of his choice. Adversarial illusions exploit proximity in the embedding space and are thus agnostic to downstream tasks and modalities, enabling a wholesale compromise of current and future tasks, as well as modalities not available to the adversary. Using ImageBind and AudioCLIP embeddings, we demonstrate how adversarially aligned inputs, generated without knowledge of specific downstream tasks, mislead image generation, text generation, zero-shot classification, and audio retrieval. We investigate transferability of illusions across different embeddings and develop a black-box version of our method that we use to demonstrate the first adversarial alignment attack on Amazon's commercial, proprietary Titan embedding. Finally, we analyze countermeasures and evasion attacks.
Problem

Research questions and friction points this paper is trying to address.

Multi-modal embeddings vulnerable to adversarial cross-modal attacks
Adversaries perturb inputs to force incorrect embedding alignments
Attacks compromise diverse downstream tasks without task-specific knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modal adversarial attack method
Exploits embedding space proximity
Black-box attack on commercial embeddings
πŸ”Ž Similar Papers
No similar papers found.