VLMs Can Aggregate Scattered Training Patches

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a novel vulnerability in vision-language models (VLMs) termed *visual stitching*: during inference, models can erroneously integrate fragmented hazardous visual elements—such as isolated gore patches—that were scattered across distinct training samples, leading to misclassification of complete dangerous images or associated text as “safe” and thereby evading content moderation. To systematically study this phenomenon, the authors formally define visual stitching and propose the first quantifiable evaluation framework, leveraging synthetic ID tagging, multi-granularity image patching, cross-sample textual alignment fine-tuning, adversarial patch injection, and reconstruction-based testing. Empirical evaluation across three mainstream open-source VLMs and multiple benchmark datasets demonstrates that all models exhibit significant stitching capability—successfully reconstructing and misclassifying hazardous content—thereby exposing *fine-grained data contamination* as a previously unrecognized attack surface in VLM safety.

Technology Category

Application Category

📝 Abstract
One way to mitigate risks in vision-language models (VLMs) is to remove dangerous samples in their training data. However, such data moderation can be easily bypassed when harmful images are split into small, benign-looking patches, scattered across many training samples. VLMs may then learn to piece these fragments together during training and generate harmful responses at inference, either from full images or text references. For instance, if trained on image patches from a bloody scene paired with the descriptions"safe,"VLMs may later describe, the full image or a text reference to the scene, as"safe."We define the core ability of VLMs enabling this attack as $ extit{visual stitching}$ -- the ability to integrate visual information spread across multiple training samples that share the same textual descriptions. In our work, we first demonstrate visual stitching abilities in common open-source VLMs on three datasets where each image is labeled with a unique synthetic ID: we split each $( exttt{image}, exttt{ID})$ pair into ${( exttt{patch}, exttt{ID})}$ pairs at different granularity for finetuning, and we find that tuned models can verbalize the correct IDs from full images or text reference. Building on this, we simulate the adversarial data poisoning scenario mentioned above by using patches from dangerous images and replacing IDs with text descriptions like ``safe'' or ``unsafe'', demonstrating how harmful content can evade moderation in patches and later be reconstructed through visual stitching, posing serious VLM safety risks. Code is available at https://github.com/ZHZisZZ/visual-stitching.
Problem

Research questions and friction points this paper is trying to address.

VLMs can reconstruct harmful content from scattered benign patches
Visual stitching enables VLMs to integrate fragmented training data
Adversarial patches evade moderation and pose safety risks
Innovation

Methods, ideas, or system contributions that make the work stand out.

VLMs integrate scattered benign-looking patches
Visual stitching combines patches from multiple samples
Adversarial data poisoning evades content moderation
🔎 Similar Papers
No similar papers found.