🤖 AI Summary
This work addresses the limitations of conventional magnitude-based pruning methods, which often fail to accurately assess the functional contribution of individual layers in object detection networks, leading to suboptimal trade-offs between accuracy and efficiency. To overcome this, the authors propose an interpretability-inspired, layer-wise pruning framework that introduces, for the first time, a SHAP-inspired gradient-activation attribution mechanism into object detection model compression. This data-driven approach evaluates the task-specific functional importance of each layer to guide structured pruning. The method is compatible with diverse detection architectures—including Faster R-CNN and YOLOv8—and demonstrates superior performance: on ShuffleNetV2, it achieves a 10% inference speedup without any mAP degradation (in contrast to a 13.7% mAP drop with L1 pruning), and maintains mAP precisely on RetinaNet, significantly outperforming traditional pruning strategies.
📝 Abstract
Deep neural networks (DNNs) have achieved remarkable success in object detection tasks, but their increasing complexity poses significant challenges for deployment on resource-constrained platforms. While model compression techniques such as pruning have emerged as essential tools, traditional magnitude-based pruning methods do not necessarily align with the true functional contribution of network components to task-specific performance. In this work, we present an explainability-inspired, layer-wise pruning framework tailored for efficient object detection. Our approach leverages a SHAP-inspired gradient--activation attribution to estimate layer importance, providing a data-driven proxy for functional contribution rather than relying solely on static weight magnitudes. We conduct comprehensive experiments across diverse object detection architectures, including ResNet-50, MobileNetV2, ShuffleNetV2, Faster R-CNN, RetinaNet, and YOLOv8, evaluating performance on the Microsoft COCO 2017 validation set. The results show that the proposed attribution-inspired pruning consistently identifies different layers as least important compared to L1-norm-based methods, leading to improved accuracy--efficiency trade-offs. Notably, for ShuffleNetV2, our method yields a 10\% empirical increase in inference speed, whereas L1-pruning degrades performance by 13.7\%. For RetinaNet, the proposed approach preserves the baseline mAP (0.151) with negligible impact on inference speed, while L1-pruning incurs a 1.3\% mAP drop for a 6.2\% speed increase. These findings highlight the importance of data-driven layer importance assessment and demonstrate that explainability-inspired compression offers a principled direction for deploying deep neural networks on edge and resource-constrained platforms while preserving both performance and interpretability.