🤖 AI Summary
To address the computational inefficiency caused by rapidly increasing parameter counts in CNNs and Transformers, this paper proposes a structured pruning method based on exact Hessian-vector products for fine-grained, low-distortion model compression. Methodologically, it directly computes full-parameter Hessian-vector products—the first such approach—and theoretically derives necessary and sufficient conditions for non-zero off-diagonal blocks in inter-layer Hessian submatrices. It further establishes an exact second-order Taylor expansion-based pruning criterion, eliminating approximations inherent in conventional first-order or surrogate second-order importance estimation. This significantly improves the fidelity of parameter importance assessment. Evaluated on VGG19, ResNet32/50, and ViT-B/16 across CIFAR-10/100 and ImageNet, the method achieves near-lossless accuracy (≤0.3% drop), 1.8–2.4× inference speedup, and 42%–67% FLOPs reduction—outperforming baselines including OBD.
📝 Abstract
The increasing complexity and parameter count of Convolutional Neural Networks (CNNs) and Transformers pose challenges in terms of computational efficiency and resource demands. Pruning has been identified as an effective strategy to address these challenges by removing redundant elements such as neurons, channels, or connections, thereby enhancing computational efficiency without heavily compromising performance. This paper builds on the foundational work of Optimal Brain Damage (OBD) by advancing the methodology of parameter importance estimation using the Hessian matrix. Unlike previous approaches that rely on approximations, we introduce Optimal Brain Apoptosis (OBA), a novel pruning method that calculates the Hessian-vector product value directly for each parameter. By decomposing the Hessian matrix across network layers and identifying conditions under which inter-layer Hessian submatrices are non-zero, we propose a highly efficient technique for computing the second-order Taylor expansion of parameters. This approach allows for a more precise pruning process, particularly in the context of CNNs and Transformers, as validated in our experiments including VGG19, ResNet32, ResNet50, and ViT-B/16 on CIFAR10, CIFAR100 and Imagenet datasets. Our code is available at https://github.com/NEU-REAL/OBA.