Feature-Space Adversarial Robustness Certification for Multimodal Large Language Models

📅 2026-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the vulnerability of multimodal large language models to ℓ²-bounded adversarial perturbations, which induce feature distortion and prediction errors. To mitigate this, the authors propose Feature Space Smoothing (FS), a framework that integrates a provably robust purifier with a smoothing mapping module (PSM). Without requiring retraining, FS transforms any feature encoder into a certifiably robust variant and provides a theoretical lower bound on the cosine similarity between clean and adversarial features. Extensive experiments demonstrate that FS drastically reduces white-box attack success rates—from nearly 90% to approximately 1%—across diverse multimodal foundation models and downstream tasks, substantially outperforming existing adversarial training approaches.

Technology Category

Application Category

📝 Abstract
Multimodal large language models (MLLMs) exhibit strong capabilities across diverse applications, yet remain vulnerable to adversarial perturbations that distort their feature representations and induce erroneous predictions. To address this vulnerability, we propose Feature-space Smoothing (FS), a general framework that provides certified robustness guarantees at the feature representation level of MLLMs. We theoretically prove that FS converts a given feature extractor into a smoothed variant that is guaranteed a certified lower bound on the cosine similarity between clean and adversarial features under $\ell_2$-bounded perturbations. Moreover, we establish that the value of this Feature Cosine Similarity Bound (FCSB) is determined by the intrinsic Gaussian robustness score of the given encoder. Building on this insight, we introduce the Gaussian Smoothness Booster (GSB), a plug-and-play module that enhances the Gaussian robustness score of pretrained MLLMs, thereby strengthening the robustness guaranteed by FS, without requiring additional MLLM retraining. Extensive experiments demonstrate that applying the FS to various MLLMs yields strong certified feature-space robustness and consistently leads to robust task-oriented performance across diverse applications.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Large Language Models
Adversarial Perturbations
Feature Representation
Certified Robustness
Robustness Guarantee
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature-space Smoothing
Certified Robustness
Multimodal Large Language Models
Plug-and-play Defense
Gaussian Robustness Score
🔎 Similar Papers
No similar papers found.