SmoothGuard: Defending Multimodal Large Language Models with Noise Perturbation and Clustering Aggregation

πŸ“… 2025-10-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Multimodal large language models (MLLMs) are vulnerable to adversarial attacks, posing significant security risks in real-world deployments. To address this, we propose SmoothGuardβ€”a lightweight, model-agnostic, and training-free defense framework. It injects Gaussian noise into visual inputs to generate multiple candidate outputs, then clusters their language embeddings in the embedding space and aggregates predictions from the majority cluster. Our work innovatively introduces embedding-space clustering as a robustness-enhancement mechanism for MLLMs and is the first to systematically integrate adversarial image generation and evaluation within the Hugging Face ecosystem. Evaluated on POPE, LLaVA-Bench, and MM-SafetyBench, SmoothGuard achieves substantial improvements in adversarial robustness under low noise intensities (0.1–0.2), while preserving near-original clean-performance. This provides a general-purpose, efficient, and plug-and-play solution for enhancing multimodal safety.

Technology Category

Application Category

πŸ“ Abstract
Multimodal large language models (MLLMs) have achieved impressive performance across diverse tasks by jointly reasoning over textual and visual inputs. Despite their success, these models remain highly vulnerable to adversarial manipulations, raising concerns about their safety and reliability in deployment. In this work, we first generalize an approach for generating adversarial images within the HuggingFace ecosystem and then introduce SmoothGuard, a lightweight and model-agnostic defense framework that enhances the robustness of MLLMs through randomized noise injection and clustering-based prediction aggregation. Our method perturbs continuous modalities (e.g., images and audio) with Gaussian noise, generates multiple candidate outputs, and applies embedding-based clustering to filter out adversarially influenced predictions. The final answer is selected from the majority cluster, ensuring stable responses even under malicious perturbations. Extensive experiments on POPE, LLaVA-Bench (In-the-Wild), and MM-SafetyBench demonstrate that SmoothGuard improves resilience to adversarial attacks while maintaining competitive utility. Ablation studies further identify an optimal noise range (0.1-0.2) that balances robustness and utility.
Problem

Research questions and friction points this paper is trying to address.

Defending MLLMs against adversarial manipulations
Enhancing robustness through noise injection
Filtering malicious predictions via clustering aggregation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Defense uses Gaussian noise injection
Applies clustering to aggregate predictions
Selects answers from majority cluster
πŸ”Ž Similar Papers
No similar papers found.
G
Guangzhi Su
Division of Natural and Applied Sciences, Duke Kunshan University, Suzhou, China
S
Shuchang Huang
Independent Researcher
Y
Yutong Ke
Division of Natural and Applied Sciences, Duke Kunshan University, Suzhou, China
Z
Zhuohang Liu
Division of Natural and Applied Sciences, Duke Kunshan University, Suzhou, China
L
Long Qian
Division of Natural and Applied Sciences, Duke Kunshan University, Suzhou, China
Kaizhu Huang
Kaizhu Huang
Professor, Duke Kunshan University
Generalization & RobustnessStatistical Learning ThoeryTrustworthy AI