Cut Less, Fold More: Model Compression through the Lens of Projection Geometry

📅 2026-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical challenge of efficiently compressing neural networks without retraining to reduce deployment costs. It presents a unified geometric perspective by interpreting structured pruning and model folding as orthogonal projections: pruning corresponds to axis-aligned projection, while folding achieves low-rank projection through weight clustering. Theoretical analysis demonstrates that, at rank distance one, folding incurs smaller parameter reconstruction error and function perturbation than pruning. Extensive experiments across over 1,000 checkpoints of ResNet, ViT, CLIP, and LLaMA architectures confirm that folding consistently outperforms pruning under medium to high compression ratios, with notable gains on CIFAR-10, ImageNet-1K, and C4. This study establishes a novel compression paradigm that is theoretically grounded, requires no calibration, and demonstrates strong empirical efficacy.

Technology Category

Application Category

📝 Abstract
Compressing neural networks without retraining is vital for deployment at scale. We study calibration-free compression through the lens of projection geometry: structured pruning is an axis-aligned projection, whereas model folding performs a low-rank projection via weight clustering. We formalize both as orthogonal operators and show that, within a rank distance of one, folding provably yields smaller parameter reconstruction error, and under mild smoothness assumptions, smaller functional perturbations than pruning. At scale, we evaluate>1000 checkpoints spanning ResNet18, PreActResNet18, ViT-B/32, and CLIP ViT-B/32 on CIFAR-10 and ImageNet-1K, covering diverse training hyperparameters (optimizers, learning rates, augmentations, regularization, sharpness-aware training), as well as multiple LLaMA-family 60M and 130M parameter models trained on C4. We show that folding typically achieves higher post-compression accuracy, with the largest gains at moderate-high compression. The gap narrows and occasionally reverses at specific training setups. Our results position folding as a geometry-aware, calibration-free alternative to pruning that is often superior in practice and principled in theory.
Problem

Research questions and friction points this paper is trying to address.

model compression
structured pruning
model folding
projection geometry
calibration-free
Innovation

Methods, ideas, or system contributions that make the work stand out.

model folding
projection geometry
calibration-free compression
low-rank projection
structured pruning