Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security

📅 2025-11-20

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Multimodal large language models (MLLMs) exhibit vulnerability to adversarial attacks on visual inputs, while existing text-based safety mechanisms cannot be directly transferred to the visual modality due to the continuity of visual representations and cross-modal safety misalignment. Method: We propose Bi-VQ, a bi-level vector quantization defense architecture that discretizes visual representations simultaneously at both pixel-block and semantic levels—introducing, for the first time, a discrete bottleneck mechanism into MLLM visual safety protection without fine-tuning or auxiliary detection modules. Leveraging vector quantization and a two-stage training strategy, Bi-VQ constructs robust discrete visual encodings that are end-to-end aligned with textual safety mechanisms. Results: Experiments demonstrate near-perfect jailbreak defense success (99.7% success rate, with only one ambiguous case), significantly outperform state-of-the-art methods against toxic image attacks, incur minimal inference overhead, and preserve strong performance across diverse downstream tasks.

Technology Category

Application Category

📝 Abstract

Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities in cross-modal understanding, but remain vulnerable to adversarial attacks through visual inputs despite robust textual safety mechanisms. These vulnerabilities arise from two core weaknesses: the continuous nature of visual representations, which allows for gradient-based attacks, and the inadequate transfer of text-based safety mechanisms to visual content. We introduce Q-MLLM, a novel architecture that integrates two-level vector quantization to create a discrete bottleneck against adversarial attacks while preserving multimodal reasoning capabilities. By discretizing visual representations at both pixel-patch and semantic levels, Q-MLLM blocks attack pathways and bridges the cross-modal safety alignment gap. Our two-stage training methodology ensures robust learning while maintaining model utility. Experiments demonstrate that Q-MLLM achieves significantly better defense success rate against both jailbreak attacks and toxic image attacks than existing approaches. Notably, Q-MLLM achieves perfect defense success rate (100%) against jailbreak attacks except in one arguable case, while maintaining competitive performance on multiple utility benchmarks with minimal inference overhead. This work establishes vector quantization as an effective defense mechanism for secure multimodal AI systems without requiring expensive safety-specific fine-tuning or detection overhead. Code is available at https://github.com/Amadeuszhao/QMLLM.

Problem

Research questions and friction points this paper is trying to address.

MLLMs remain vulnerable to adversarial attacks through visual inputs

Continuous visual representations enable gradient-based attacks on models

Text-based safety mechanisms inadequately transfer to visual content

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vector quantization creates discrete bottleneck against attacks

Discretizes visual representations at pixel and semantic levels

Two-stage training ensures robust learning with model utility

🔎 Similar Papers

No similar papers found.

Authors to Follow