🤖 AI Summary
This work addresses the challenge of cross-distribution generalization in synthetic image detection under unknown generative sources by proposing the I2P framework. I2P formulates the adaptation of vision foundation models as a joint optimization problem: it adaptively selects the most discriminative representation layers for forgery cues through multi-level feature importance awareness, while simultaneously performing constrained fine-tuning within a low-sensitivity parameter subspace to preserve the transferability of pre-trained structures. This dual strategy significantly enhances the model’s generalization performance on images generated by unseen synthesis models, without compromising the open-set recognition capabilities inherent to the underlying vision foundation model.
📝 Abstract
With the rapid development of generative models and multimodal content editing technologies, the key challenge faced by synthetic image detection (SID) lies in cross-distribution generalization to unknown generation sources. In recent years, visual foundation models (VFM), which acquire rich visual priors through large scale image-text alignment pretraining, have become a promising technical route for improving the generalization ability of SID. However, existing VFM-based methods remain relatively coarse-grained in their adaptation strategies. They typically either directly use the final layer representations of VFM or simply fuse multi layer features, lacking explicit modeling of the optimal representational hierarchy for transferable forgery cues. Meanwhile, although directly fine-tuning VFM can enhance task adaptation, it may also damage the cross-modal pretrained structure that supports open-set generalization. To address this task specific tension, we reformulate VFM adaptation for SID as a joint optimization problem: it is necessary both to identify the critical representational layer that is more suitable for carrying forgery discriminative information and to constrain the disturbance caused by task knowledge injection to the pretrained structure. Based on this, we propose I2P, an SID framework centered on intrinsic importance perception. I2P first adaptively identifies the critical layer representations that are most discriminative for SID, and then constrains task-driven parameter updates within a low sensitivity parameter subspace, thereby improving task specificity while preserving the transferable structure of pretrained representations as much as possible.