$ShiftwiseConv:$ Small Convolutional Kernel with Large Kernel Effect

📅 2024-01-23

📈 Citations: 5

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Large-kernel convolutions in CNNs exhibit diminishing returns in performance gains, and their underlying mechanisms remain poorly understood. Method: This paper identifies and decouples two fundamental functions of large-kernel convolutions—granular feature extraction and multi-path feature fusion—and proposes the Shiftwise (SW) convolution operator. SW achieves long-range dependency modeling and replicates the feature extraction and fusion capabilities of large kernels using only 3×3 kernels via shift-based sparse multi-path connections. It abandons the paradigm of increasing kernel size, drastically reducing computational cost and parameter count. Contribution/Results: SW-CNN consistently outperforms state-of-the-art large-kernel models—including SLaK and UniRepLKNet—across image classification, object detection, and semantic segmentation tasks. These results validate the effectiveness and generalizability of the “small-kernel, large-receptive-field” design principle.

Technology Category

Application Category

📝 Abstract

Large kernels make standard convolutional neural networks (CNNs) great again over transformer architectures in various vision tasks. Nonetheless, recent studies meticulously designed around increasing kernel size have shown diminishing returns or stagnation in performance. Thus, the hidden factors of large kernel convolution that affect model performance remain unexplored. In this paper, we reveal that the key hidden factors of large kernels can be summarized as two separate components: extracting features at a certain granularity and fusing features by multiple pathways. To this end, we leverage the multi-path long-distance sparse dependency relationship to enhance feature utilization via the proposed Shiftwise (SW) convolution operator with a pure CNN architecture. In a wide range of vision tasks such as classification, segmentation, and detection, SW surpasses state-of-the-art transformers and CNN architectures, including SLaK and UniRepLKNet. More importantly, our experiments demonstrate that $3 imes 3$ convolutions can replace large convolutions in existing large kernel CNNs to achieve comparable effects, which may inspire follow-up works. Code and all the models at https://github.com/lidc54/shift-wiseConv.

Problem

Research questions and friction points this paper is trying to address.

Explores hidden factors in large kernel convolution affecting CNN performance.

Proposes Shiftwise convolution to enhance feature utilization in vision tasks.

Demonstrates small convolutions can replace large ones with comparable effects.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Shiftwise convolution enhances feature utilization

Multi-path long-distance sparse dependency relationship

Replaces large convolutions with 3x3 kernels effectively

🔎 Similar Papers

No similar papers found.