The Security Threat of Compressed Projectors in Large Vision-Language Models

📅 2025-05-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study identifies, for the first time, a critical security vulnerability in large vision-language models (LVLMs): compressed vision-language projectors (VLPs) exhibit significantly weaker robustness against black-box and gray-box adversarial attacks compared to their non-compressed counterparts—particularly under structural information constraints. Method: We conduct gradient leakage analysis, structural sensitivity evaluation, and rigorous robustness benchmarking across diverse attack settings. Contribution/Results: Under input-output–only (i.e., zero-knowledge) conditions, compressed VLPs achieve adversarial success rates exceeding 78%, whereas non-compressed VLPs maintain strong security with negligible degradation. Based on these findings, we propose a novel “performance–security co-design” paradigm for VLP selection, providing actionable architectural guidelines for secure LVLM development. This work bridges a key theoretical and practical gap in VLP security assessment and establishes foundational principles for trustworthy multimodal model deployment.

Technology Category

Application Category

📝 Abstract
The choice of a suitable visual language projector (VLP) is critical to the successful training of large visual language models (LVLMs). Mainstream VLPs can be broadly categorized into compressed and uncompressed projectors, and each offering distinct advantages in performance and computational efficiency. However, their security implications have not been thoroughly examined. Our comprehensive evaluation reveals significant differences in their security profiles: compressed projectors exhibit substantial vulnerabilities, allowing adversaries to successfully compromise LVLMs even with minimal knowledge of structural information. In stark contrast, uncompressed projectors demonstrate robust security properties and do not introduce additional vulnerabilities. These findings provide critical guidance for researchers in selecting optimal VLPs that enhance the security and reliability of visual language models. The code will be released.
Problem

Research questions and friction points this paper is trying to address.

Examining security risks of compressed VLPs in LVLMs
Comparing vulnerability levels between compressed and uncompressed projectors
Guiding optimal VLP selection for enhanced model security
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates security of compressed and uncompressed VLPs
Compressed projectors show significant vulnerabilities
Uncompressed projectors offer robust security properties
🔎 Similar Papers
No similar papers found.