๐ค AI Summary
To address feature sparsity in small-object detection caused by pooling operations and the compromised multi-scale perception capability of conventional dilated convolutions, this paper proposes the Switchable Atrous Convolution Network (SAC-Net). Methodologically, SAC-Net introduces three key innovations: (1) dynamic dilation rate switching during forward propagation to simultaneously preserve dense features and maintain multi-scale receptive fields; (2) a coupled design of depthwise separable dilated convolutions with a unified globalโlocal contextual modeling mechanism, enabling scale-invariant feature enhancement; and (3) an adaptive feature fusion module. Integrated into the EfficientDet framework and evaluated on COCO, SAC-Net achieves a 4.2% absolute gain in APโ (small-object average precision) and a 2.8% improvement in overall AP, outperforming state-of-the-art methods. These results empirically validate SAC-Netโs superior scale robustness and capacity to retain high-density discriminative features.
๐ Abstract
Dense features are important for detecting minute objects in images. Unfortunately, despite the remarkable efficacy of the CNN models in multi-scale object detection, CNN models often fail to detect smaller objects in images due to the loss of dense features during the pooling process. Atrous convolution addresses this issue by applying sparse kernels. However, sparse kernels often can lose the multi-scale detection efficacy of the CNN model. In this paper, we propose an object detection model using a Switchable (adaptive) Atrous Convolutional Network (SAC-Net) based on the efficientDet model. A fixed atrous rate limits the performance of the CNN models in the convolutional layers. To overcome this limitation, we introduce a switchable mechanism that allows for dynamically adjusting the atrous rate during the forward pass. The proposed SAC-Net encapsulates the benefits of both low-level and high-level features to achieve improved performance on multi-scale object detection tasks, without losing the dense features. Further, we apply a depth-wise switchable atrous rate to the proposed network, to improve the scale-invariant features. Finally, we apply global context on the proposed model. Our extensive experiments on benchmark datasets demonstrate that the proposed SAC-Net outperforms the state-of-the-art models by a significant margin in terms of accuracy.