π€ AI Summary
Multimodal large language models (MLLMs) are vulnerable to adversarial attacks, posing significant security risks in real-world deployments. To address this, we propose SmoothGuardβa lightweight, model-agnostic, and training-free defense framework. It injects Gaussian noise into visual inputs to generate multiple candidate outputs, then clusters their language embeddings in the embedding space and aggregates predictions from the majority cluster. Our work innovatively introduces embedding-space clustering as a robustness-enhancement mechanism for MLLMs and is the first to systematically integrate adversarial image generation and evaluation within the Hugging Face ecosystem. Evaluated on POPE, LLaVA-Bench, and MM-SafetyBench, SmoothGuard achieves substantial improvements in adversarial robustness under low noise intensities (0.1β0.2), while preserving near-original clean-performance. This provides a general-purpose, efficient, and plug-and-play solution for enhancing multimodal safety.
π Abstract
Multimodal large language models (MLLMs) have achieved impressive performance across diverse tasks by jointly reasoning over textual and visual inputs. Despite their success, these models remain highly vulnerable to adversarial manipulations, raising concerns about their safety and reliability in deployment. In this work, we first generalize an approach for generating adversarial images within the HuggingFace ecosystem and then introduce SmoothGuard, a lightweight and model-agnostic defense framework that enhances the robustness of MLLMs through randomized noise injection and clustering-based prediction aggregation. Our method perturbs continuous modalities (e.g., images and audio) with Gaussian noise, generates multiple candidate outputs, and applies embedding-based clustering to filter out adversarially influenced predictions. The final answer is selected from the majority cluster, ensuring stable responses even under malicious perturbations. Extensive experiments on POPE, LLaVA-Bench (In-the-Wild), and MM-SafetyBench demonstrate that SmoothGuard improves resilience to adversarial attacks while maintaining competitive utility. Ablation studies further identify an optimal noise range (0.1-0.2) that balances robustness and utility.