CORP: Closed-Form One-shot Representation-Preserving Structured Pruning for Vision Transformers

πŸ“… 2026-02-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of deploying Vision Transformers (ViTs), which suffer from high computational and memory costs, by proposing CORPβ€”a novel post-training pruning framework that achieves structured sparsity without labels, gradients, or fine-tuning. CORP formulates structured pruning as a representation recovery problem and derives an affine compensation relationship between retained and pruned components, enabling direct closed-form weight updates via ridge regression. Requiring only a small amount of unlabeled calibration data, CORP prunes 50% of both MLP and attention modules in DeiT-Huge while preserving a Top-1 accuracy of 82.8%. The entire pruning process completes in under 20 minutes on a single GPU, substantially enhancing inference efficiency.

Technology Category

Application Category

πŸ“ Abstract
Vision Transformers achieve strong accuracy but incur high compute and memory cost. Structured pruning can reduce inference cost, but most methods rely on retraining or multi-stage optimization. These requirements limit post-training deployment. We propose \textbf{CORP}, a closed-form one-shot structured pruning framework for Vision Transformers. CORP removes entire MLP hidden dimensions and attention substructures without labels, gradients, or fine-tuning. It operates under strict post-training constraints using only a small unlabeled calibration set. CORP formulates structured pruning as a representation recovery problem. It models removed activations and attention logits as affine functions of retained components and derives closed-form ridge regression solutions that fold compensation into model weights. This minimizes expected representation error under the calibration distribution. Experiments on ImageNet with DeiT models show strong redundancy in MLP and attention representations. Without compensation, one-shot structured pruning causes severe accuracy degradation. With CORP, models preserve accuracy under aggressive sparsity. On DeiT-Huge, CORP retains 82.8\% Top-1 accuracy after pruning 50\% of both MLP and attention structures. CORP completes pruning in under 20 minutes on a single GPU and delivers substantial real-world efficiency gains.
Problem

Research questions and friction points this paper is trying to address.

structured pruning
Vision Transformers
post-training
representation preservation
one-shot
Innovation

Methods, ideas, or system contributions that make the work stand out.

structured pruning
Vision Transformers
post-training compression
closed-form solution
representation preservation
πŸ”Ž Similar Papers
No similar papers found.