🤖 AI Summary
This work addresses the “asymmetry phenomenon” in AI-generated image detection—characterized by low-rank feature representations and poor generalization—arising from overfitting to synthetic artifacts. For the first time, we identify feature rank deficiency as the root cause, framing it through the lens of feature rank constraints. We propose an orthogonal subspace decomposition framework: leveraging SVD, visual foundation model features are decoupled into frozen principal components and trainable orthogonal residual components, explicitly enforcing high-rank representation while preserving pretrained knowledge and enhancing forgery modeling capacity. Unlike full fine-tuning or LoRA, our method overcomes their generalization bottlenecks. Extensive experiments across multiple deepfake and synthetic image benchmarks demonstrate an average 5.2% improvement in cross-dataset detection accuracy, a 37% increase in feature space rank, and significantly superior robustness over state-of-the-art methods.
📝 Abstract
AI-generated images (AIGIs), such as natural or face images, have become increasingly realistic and indistinguishable, making their detection a critical and pressing challenge. In this paper, we start from a new perspective to excavate the reason behind the failure generalization in AIGI detection, named the extit{asymmetry phenomenon}, where a naively trained detector tends to favor overfitting to the limited and monotonous fake patterns, causing the feature space to become highly constrained and low-ranked, which is proved seriously limiting the expressivity and generalization. One potential remedy is incorporating the pre-trained knowledge within the vision foundation models (higher-ranked) to expand the feature space, alleviating the model's overfitting to fake. To this end, we employ Singular Value Decomposition (SVD) to decompose the original feature space into two orthogonal subspaces. By freezing the principal components and adapting only the remained components, we preserve the pre-trained knowledge while learning forgery-related patterns. Compared to existing full-parameters and LoRA-based tuning methods, we explicitly ensure orthogonality enabling the higher rank of the whole feature space, effectively minimizing overfitting and enhancing generalization. Extensive experiments with our deep analysis on both deepfake and synthetic image detection benchmarks demonstrate superior generalization performance in detection.