π€ AI Summary
This work addresses the challenge of deploying Vision Transformers (ViTs), which suffer from high computational and memory costs, by proposing CORPβa novel post-training pruning framework that achieves structured sparsity without labels, gradients, or fine-tuning. CORP formulates structured pruning as a representation recovery problem and derives an affine compensation relationship between retained and pruned components, enabling direct closed-form weight updates via ridge regression. Requiring only a small amount of unlabeled calibration data, CORP prunes 50% of both MLP and attention modules in DeiT-Huge while preserving a Top-1 accuracy of 82.8%. The entire pruning process completes in under 20 minutes on a single GPU, substantially enhancing inference efficiency.
π Abstract
Vision Transformers achieve strong accuracy but incur high compute and memory cost. Structured pruning can reduce inference cost, but most methods rely on retraining or multi-stage optimization. These requirements limit post-training deployment. We propose \textbf{CORP}, a closed-form one-shot structured pruning framework for Vision Transformers. CORP removes entire MLP hidden dimensions and attention substructures without labels, gradients, or fine-tuning. It operates under strict post-training constraints using only a small unlabeled calibration set. CORP formulates structured pruning as a representation recovery problem. It models removed activations and attention logits as affine functions of retained components and derives closed-form ridge regression solutions that fold compensation into model weights. This minimizes expected representation error under the calibration distribution. Experiments on ImageNet with DeiT models show strong redundancy in MLP and attention representations. Without compensation, one-shot structured pruning causes severe accuracy degradation. With CORP, models preserve accuracy under aggressive sparsity. On DeiT-Huge, CORP retains 82.8\% Top-1 accuracy after pruning 50\% of both MLP and attention structures. CORP completes pruning in under 20 minutes on a single GPU and delivers substantial real-world efficiency gains.