Robust image classification with multi-modal large language models

📅 2024-12-13
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing adversarial defense methods are predominantly unimodal and neglect semantic consistency between vision and language modalities. To address this limitation, we propose MultiShield—the first framework to leverage multimodal large language models (e.g., LLaVA, Qwen-VL) for adversarial sample detection. MultiShield dynamically assesses representation consistency via cross-modal visual–textual embedding alignment, enabling uncertainty-aware and interpretable rejection without requiring model retraining—supporting plug-and-play deployment. Evaluated on CIFAR-10 and ImageNet, MultiShield significantly enhances robustness: it improves adversarial detection accuracy, increases rejection rate by 23.6%, and reduces false rejection rate to 1.2%. By explicitly modeling semantic alignment across modalities, MultiShield effectively bridges the gap left by unimodal defenses in capturing cross-modal semantic consistency.

Technology Category

Application Category

📝 Abstract
Deep Neural Networks are vulnerable to adversarial examples, i.e., carefully crafted input samples that can cause models to make incorrect predictions with high confidence. To mitigate these vulnerabilities, adversarial training and detection-based defenses have been proposed to strengthen models in advance. However, most of these approaches focus on a single data modality, overlooking the relationships between visual patterns and textual descriptions of the input. In this paper, we propose a novel defense, MultiShield, designed to combine and complement these defenses with multi-modal information to further enhance their robustness. MultiShield leverages multi-modal large language models to detect adversarial examples and abstain from uncertain classifications when there is no alignment between textual and visual representations of the input. Extensive evaluations on CIFAR-10 and ImageNet datasets, using robust and non-robust image classification models, demonstrate that MultiShield can be easily integrated to detect and reject adversarial examples, outperforming the original defenses.
Problem

Research questions and friction points this paper is trying to address.

Address vulnerability of Deep Neural Networks to adversarial examples
Enhance robustness using multi-modal information integration
Detect adversarial examples via visual-textual alignment in MultiShield
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages multi-modal large language models
Detects adversarial examples via alignment
Integrates textual and visual representations
🔎 Similar Papers
No similar papers found.
F
Francesco Villani
University of Genoa, Via Dodecaneso 35, Genoa, 16145, Italy
I
Igor Maljkovic
University of Genoa, Via Dodecaneso 35, Genoa, 16145, Italy
D
Dario Lazzaro
Sapienza University of Rome, Via Ariosto 25, Rome, 00185, Italy
Angelo Sotgiu
Angelo Sotgiu
Assistant Professor, University of Cagliari
A
A. E. Cinà
University of Genoa, Via Dodecaneso 35, Genoa, 16145, Italy
Fabio Roli
Fabio Roli
Professor, University of Genova and Cagliari, Italy
Pattern recognitionmachine learningcomputer visioncomputer security