🤖 AI Summary
Deploying deep visual detectors on edge devices remains challenging due to their high computational and memory demands; existing pruning methods neglect localization cues and rely passively on pre-trained models. Method: We propose a location-aware discriminative analysis–driven compression framework that explicitly incorporates object spatial coordinates into the pruning process. By modeling neuron/filter discriminability and directly aligning feature representations upstream of the detection head, our method jointly optimizes feature importance tracking and localization-aware compression via alternating optimization—supporting mainstream architectures including YOLO and Faster R-CNN. Results: On KITTI and COCO benchmarks, our approach reduces model complexity by 30–50% in FLOPs and parameters over four SOTA methods, while simultaneously improving detection accuracy (AP gains of +0.5–1.2), thereby breaking the conventional passive pruning paradigm.
📝 Abstract
Deep neural networks are powerful, yet their high complexity greatly limits their potential to be deployed on billions of resource-constrained edge devices. Pruning is a crucial network compression technique, yet most existing methods focus on classification models, with limited attention to detection. Even among those addressing detection, there is a lack of utilization of essential localization information. Also, many pruning methods passively rely on pre-trained models, in which useful and useless components are intertwined, making it difficult to remove the latter without harming the former at the neuron/filter level. To address the above issues, in this paper, we propose a proactive detection-discriminants-based network compression approach for deep visual detectors, which alternates between two steps: (1) maximizing and compressing detection-related discriminants and aligning them with a subset of neurons/filters immediately before the detection head, and (2) tracing the detection-related discriminating power across the layers and discarding features of lower importance. Object location information is exploited in both steps. Extensive experiments, employing four advanced detection models and four state-of-the-art competing methods on the KITTI and COCO datasets, highlight the superiority of our approach. Remarkably, our compressed models can even beat the original base models with a substantial reduction in complexity.