An Information Theory-inspired Strategy for Automatic Network Pruning

📅 2021-08-19
🏛️ International Journal of Computer Vision
📈 Citations: 13
Influential: 1
📄 PDF
🤖 AI Summary
To address the challenges of excessive human intervention, weak theoretical foundations, and poor cross-device generalization in deep model compression for resource-constrained devices, this paper proposes a fully automated, search-free pruning method. Grounded in information bottleneck theory, we introduce the normalized Hilbert–Schmidt Independence Criterion (nHSIC) as a theoretically grounded, stable layer-wise importance metric—first of its kind—and rigorously prove that optimizing nHSIC is equivalent to minimizing inter-layer mutual information. Pruning is formulated as a convex optimization problem (e.g., solvable via OSQP), enabling efficient, deterministic solutions. On ImageNet, our method compresses ResNet-50 by 45.3% FLOPs while retaining a top-1 accuracy of 75.75%, outperforming state-of-the-art approaches. Crucially, each pruning optimization completes in only a few seconds, eliminating iterative search and manual tuning.
📝 Abstract
Despite superior performance on many computer vision tasks, deep convolution neural networks are well known to be compressed on devices that have resource constraints. Most existing network pruning methods require laborious human efforts and prohibitive computation resources, especially when the constraints are changed. This practically limits the application of model compression when the model needs to be deployed on a wide range of devices. Besides, existing methods are still challenged by the missing theoretical guidance. In this paper we propose an information theory-inspired strategy for automatic model compression. The principle behind our method is the information bottleneck theory, i.e., the hidden representation should compress information with each other. We thus introduce the normalized Hilbert-Schmidt Independence Criterion (nHSIC) on network activations as a stable and generalized indicator of layer importance. When a certain resource constraint is given, we integrate the HSIC indicator with the constraint to transform the architecture search problem into a linear programming problem with quadratic constraints. Such a problem is easily solved by a convex optimization method with a few seconds. We also provide a rigorous proof to reveal that optimizing the normalized HSIC simultaneously minimizes the mutual information between different layers. Without any search process, our method achieves better compression tradeoffs comparing to the state-of-the-art compression algorithms. For instance, with ResNet-50, we achieve a 45.3%-FLOPs reduction, with a 75.75 top-1 accuracy on ImageNet. Codes are avaliable at https://github.com/MAC-AutoML/ITPruner/tree/master.
Problem

Research questions and friction points this paper is trying to address.

Automatic network pruning for resource-constrained devices
Theoretical guidance lacking in existing compression methods
Optimizing layer importance via information bottleneck theory
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses information bottleneck theory for compression
Employs nHSIC as layer importance indicator
Transforms pruning into convex optimization problem
🔎 Similar Papers
No similar papers found.
Xiawu Zheng
Xiawu Zheng
Associate Professor, IEEE Senior Member, Xiamen University
Automated Machine LearningNetwork CompressionNeural Architecture SearchAutoML
Y
Yuexiao Ma
Media Analytics and Computing Lab, Department of Artificial Intelligence, School of Informatics, Xiamen, China.
Teng Xi
Teng Xi
Department of Computer Vision Technology (VIS), Baidu Inc, Beijing, China.
Gang Zhang
Gang Zhang
Tsinghua University
computer vision
Errui Ding
Errui Ding
Baidu Inc.
computer visionmachine learning
Yuchao Li
Yuchao Li
Arizona State University
Optimal ControlReinforcement Learning
J
Jie Chen
Institute of Digital Media Peking University, Beijing, China.
Y
Yonghong Tian
National Engineering Laboratory for Video Technology (NELVT), School of Electronics Engineering and Computer Science, Beijing, China.
R
Rongrong Ji
Media Analytics and Computing Lab, Department of Artificial Intelligence, School of Informatics, Xiamen, China.