SIF: Semantically In-Distribution Fingerprints for Large Vision-Language Models

📅 2026-04-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

213K/year
🤖 AI Summary
This work proposes SIF, a non-intrusive ownership verification framework for large vision-language models that embeds fingerprints without modifying model parameters. Unlike existing methods that rely on semantic anomalies or out-of-distribution responses—making them vulnerable to detection and removal—SIF transfers text-based watermarks into visual responses via cross-modal watermarking, preserving both semantic consistency and in-distribution characteristics. The approach introduces two key innovations: Semantic-Aligned Fingerprint Distillation (SAFD) and Robust Fingerprint Optimization (RFO), further enhancing resilience by simulating worst-case perturbations during training. Extensive experiments demonstrate that SIF maintains high stealth and robustness against common model modifications such as fine-tuning and quantization on LLaVA-1.5 and Qwen2.5-VL, effectively enabling reliable copyright protection.

Technology Category

Application Category

📝 Abstract
The public accessibility of large vision-language models (LVLMs) raises serious concerns about unauthorized model reuse and intellectual property infringement. Existing ownership verification methods often rely on semantically abnormal queries or out-of-distribution responses as fingerprints, which can be easily detected and removed by adversaries. We expose this vulnerability through a Semantic Divergence Attack (SDA), which identifies and filters fingerprint queries by measuring semantic divergence between a suspect model and a reference model, showing that existing fingerprints are not semantic-preserving and are therefore easy to detect and bypass. To address these limitations, we propose SIF (Semantically In-Distribution Fingerprints), a non-intrusive ownership verification framework that requires no parameter modification. SIF introduces Semantic-Aligned Fingerprint Distillation (SAFD), which transfers text watermarking signals into the visual modality to produce semantically coherent yet fingerprinted responses. In addition, Robust-Fingerprint Optimization (RFO) enhances robustness by simulating worst-case representation perturbations, making the fingerprints resilient to model modifications such as fine-tuning and quantization. Extensive experiments on LLaVA-1.5 and Qwen2.5-VL demonstrate that SIF achieves strong stealthiness and robustness, providing a practical solution for LVLM copyright protection. Code is available at https://github.com/UCF-ML-Research/SIF-VLM-Fingerprint
Problem

Research questions and friction points this paper is trying to address.

ownership verification
large vision-language models
model fingerprinting
intellectual property infringement
semantic divergence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantically In-Distribution Fingerprints
Semantic-Aligned Fingerprint Distillation
Robust-Fingerprint Optimization
Vision-Language Model Watermarking
Non-Intrusive Ownership Verification
🔎 Similar Papers