Scale-Invariant Object Detection by Adaptive Convolution with Unified Global-Local Context

📅 2024-09-17

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

To address feature sparsity in small-object detection caused by pooling operations and the compromised multi-scale perception capability of conventional dilated convolutions, this paper proposes the Switchable Atrous Convolution Network (SAC-Net). Methodologically, SAC-Net introduces three key innovations: (1) dynamic dilation rate switching during forward propagation to simultaneously preserve dense features and maintain multi-scale receptive fields; (2) a coupled design of depthwise separable dilated convolutions with a unified global–local contextual modeling mechanism, enabling scale-invariant feature enhancement; and (3) an adaptive feature fusion module. Integrated into the EfficientDet framework and evaluated on COCO, SAC-Net achieves a 4.2% absolute gain in APₛ (small-object average precision) and a 2.8% improvement in overall AP, outperforming state-of-the-art methods. These results empirically validate SAC-Net’s superior scale robustness and capacity to retain high-density discriminative features.

Technology Category

Application Category

📝 Abstract

Dense features are important for detecting minute objects in images. Unfortunately, despite the remarkable efficacy of the CNN models in multi-scale object detection, CNN models often fail to detect smaller objects in images due to the loss of dense features during the pooling process. Atrous convolution addresses this issue by applying sparse kernels. However, sparse kernels often can lose the multi-scale detection efficacy of the CNN model. In this paper, we propose an object detection model using a Switchable (adaptive) Atrous Convolutional Network (SAC-Net) based on the efficientDet model. A fixed atrous rate limits the performance of the CNN models in the convolutional layers. To overcome this limitation, we introduce a switchable mechanism that allows for dynamically adjusting the atrous rate during the forward pass. The proposed SAC-Net encapsulates the benefits of both low-level and high-level features to achieve improved performance on multi-scale object detection tasks, without losing the dense features. Further, we apply a depth-wise switchable atrous rate to the proposed network, to improve the scale-invariant features. Finally, we apply global context on the proposed model. Our extensive experiments on benchmark datasets demonstrate that the proposed SAC-Net outperforms the state-of-the-art models by a significant margin in terms of accuracy.

Problem

Research questions and friction points this paper is trying to address.

Detects small objects by preserving dense features in images.

Improves multi-scale object detection using adaptive atrous convolution.

Enhances scale-invariant features with global-local context integration.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Switchable Atrous Convolutional Network (SAC-Net)

Dynamic atrous rate adjustment mechanism

Global-local context integration for scale-invariance

🔎 Similar Papers

SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation