🤖 AI Summary
Vision Transformers (ViTs) incur high computational overhead, hindering deployment in resource-constrained settings; existing pruning methods fail under cross-domain transfer due to static, early-stage importance estimation. This paper proposes a dynamic modular pruning framework tailored for cross-domain adaptation. We first identify the phenomenon of weak task-sensitive representation capability in early downstream layers, and accordingly design a block-level importance scoring mechanism grounded in global performance gain. Further, we introduce inter-layer preservation-ratio optimization and globally coordinated dynamic mask adjustment, enabling parameter reactivation of critical modules during convergence. Under 70% parameter compression, our method incurs only a 0.64% accuracy drop—substantially outperforming state-of-the-art pruning approaches, especially in cross-domain transfer scenarios.
📝 Abstract
Vision Transformer have set new benchmarks in several tasks, but these models come with the lack of high computational costs which makes them impractical for resource limited hardware. Network pruning reduces the computational complexity by removing less important operations while maintaining performance. However, pruning a model on an unseen data domain, leads to a misevaluation of weight significance, resulting in suboptimal resource assignment. In this work, we find that task-sensitive layers initially fail to improve the feature representation on downstream tasks, leading to performance loss for early pruning decisions. To address this problem, we introduce Pruning by Block Benefit (P3B), a pruning method that utilizes the relative contribution on block level to globally assign parameter resources. P3B identifies low-impact components to reduce parameter allocation while preserving critical ones. Classical pruning mask optimization struggles to reactivate zero-mask-elements. In contrast, P3B sets a layerwise keep ratio based on global performance metrics, ensuring the reactivation of late-converging blocks. We show in extensive experiments that P3B is a state of the art pruning method with most noticeable gains in transfer learning tasks. Notably, P3B is able to conserve high performance, even in high sparsity regimes of 70% parameter reduction while only losing 0.64% in accuracy.