🤖 AI Summary
To address the challenge of simultaneously achieving high compression ratios and high accuracy when deploying deep neural networks on resource-constrained devices, this paper proposes a Hessian-curvature-aware dynamic structured pruning method. The approach introduces three key innovations: (1) a cyclic weight-merging mechanism enabling fine-grained structured pruning; (2) efficient parameter importance estimation via power iteration to approximate Hessian-vector products, circumventing expensive second-order computations; and (3) adaptive thresholding guided by local curvature information of the loss function. Extensive experiments on ResNet-18/56 and MobileNetV2 across CIFAR-10, CIFAR-100, and ImageNet demonstrate that the method achieves up to 4.2× FLOPs reduction while improving top-1 accuracy by up to 1.8%, consistently outperforming state-of-the-art pruning techniques.
📝 Abstract
Deep learning algorithms are becoming an essential component of many artificial intelligence (AI) driven applications, many of which run on resource-constrained and energy-constrained systems. For efficient deployment of these algorithms, although different techniques for the compression of neural network models are proposed, neural pruning is one of the fastest and effective methods, which can provide a high compression gain with minimal cost. To harness enhanced performance gain with respect to model complexity, we propose a novel neural network pruning approach utilizing Hessian-vector products that approximate crucial curvature information in the loss function, which significantly reduces the computation demands. By employing a power iteration method, our algorithm effectively identifies and preserves the essential information, ensuring a balanced trade-off between model accuracy and computational efficiency. Herein, we introduce CAMP-HiVe, a cyclic pair merging-based pruning with Hessian Vector approximation by iteratively consolidating weight pairs, combining significant and less significant weights, thus effectively streamlining the model while preserving its performance. This dynamic, adaptive framework allows for real-time adjustment of weight significance, ensuring that only the most critical parameters are retained. Our experimental results demonstrate that our proposed method achieves significant reductions in computational requirements while maintaining high performance across different neural network architectures, e.g., ResNet18, ResNet56, and MobileNetv2, on standard benchmark datasets, e.g., CIFAR10, CIFAR-100, and ImageNet, and it outperforms the existing state-of-the-art neural pruning methods.