🤖 AI Summary
Verifying ownership of Vision Foundation Models (VFMs)—distinguishing legitimately licensed copies from illicitly redistributed or independently trained models—remains an open challenge. Method: We propose a transferable digital watermarking technique based on internal activation features. Unlike prior input- or output-level approaches, our method embeds watermarks into intermediate model representations by leveraging large-scale layer-wise activation patterns. It fine-tunes only a few highly expressive layers while coupling a lightweight encoder-decoder architecture. Contribution/Results: The watermark remains robust to downstream task fine-tuning, achieving >98% detection accuracy and <0.5% false positive rate (i.e., misclassifying forged models as watermarked), with negligible performance degradation on original tasks. It significantly reduces both false positives (forgeries) and false negatives (functional copies). This provides a verifiable, highly robust, and low-overhead solution for VFM copyright protection.
📝 Abstract
Being trained on large and vast datasets, visual foundation models (VFMs) can be fine-tuned for diverse downstream tasks, achieving remarkable performance and efficiency in various computer vision applications. The high computation cost of data collection and training motivates the owners of some VFMs to distribute them alongside the license to protect their intellectual property rights. However, a dishonest user of the protected model's copy may illegally redistribute it, for example, to make a profit. As a consequence, the development of reliable ownership verification tools is of great importance today, since such methods can be used to differentiate between a redistributed copy of the protected model and an independent model. In this paper, we propose an approach to ownership verification of visual foundation models by fine-tuning a small set of expressive layers of a VFM along with a small encoder-decoder network to embed digital watermarks into an internal representation of a hold-out set of input images. Importantly, the watermarks embedded remain detectable in the functional copies of the protected model, obtained, for example, by fine-tuning the VFM for a particular downstream task. Theoretically and experimentally, we demonstrate that the proposed method yields a low probability of false detection of a non-watermarked model and a low probability of false misdetection of a watermarked model.