SmoothGuard: Defending Multimodal Large Language Models with Noise Perturbation and Clustering Aggregation

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Multimodal large language models (MLLMs) are vulnerable to adversarial attacks, posing significant security risks in real-world deployments. To address this, we propose SmoothGuard—a lightweight, model-agnostic, and training-free defense framework. It injects Gaussian noise into visual inputs to generate multiple candidate outputs, then clusters their language embeddings in the embedding space and aggregates predictions from the majority cluster. Our work innovatively introduces embedding-space clustering as a robustness-enhancement mechanism for MLLMs and is the first to systematically integrate adversarial image generation and evaluation within the Hugging Face ecosystem. Evaluated on POPE, LLaVA-Bench, and MM-SafetyBench, SmoothGuard achieves substantial improvements in adversarial robustness under low noise intensities (0.1–0.2), while preserving near-original clean-performance. This provides a general-purpose, efficient, and plug-and-play solution for enhancing multimodal safety.

Technology Category

Application Category

📝 Abstract

Multimodal large language models (MLLMs) have achieved impressive performance across diverse tasks by jointly reasoning over textual and visual inputs. Despite their success, these models remain highly vulnerable to adversarial manipulations, raising concerns about their safety and reliability in deployment. In this work, we first generalize an approach for generating adversarial images within the HuggingFace ecosystem and then introduce SmoothGuard, a lightweight and model-agnostic defense framework that enhances the robustness of MLLMs through randomized noise injection and clustering-based prediction aggregation. Our method perturbs continuous modalities (e.g., images and audio) with Gaussian noise, generates multiple candidate outputs, and applies embedding-based clustering to filter out adversarially influenced predictions. The final answer is selected from the majority cluster, ensuring stable responses even under malicious perturbations. Extensive experiments on POPE, LLaVA-Bench (In-the-Wild), and MM-SafetyBench demonstrate that SmoothGuard improves resilience to adversarial attacks while maintaining competitive utility. Ablation studies further identify an optimal noise range (0.1-0.2) that balances robustness and utility.

Problem

Research questions and friction points this paper is trying to address.

Defending MLLMs against adversarial manipulations

Enhancing robustness through noise injection

Filtering malicious predictions via clustering aggregation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Defense uses Gaussian noise injection

Applies clustering to aggregate predictions

Selects answers from majority cluster

🔎 Similar Papers

DiffuseDef: Improved Robustness to Adversarial Attacks