🤖 AI Summary
To address the high parameter count and computational complexity of ConvNeXt in resource-constrained scenarios, this paper proposes E-ConvNeXt—a lightweight variant that replaces Layer Scale with cross-stage partial connections (CSP), depthwise separable convolutions, and channel-wise attention, while also redesigning the Stem and Block architectures. These modifications significantly reduce model complexity without compromising representational capacity, achieving an effective balance between feature expressiveness and computational efficiency. Experiments demonstrate that E-ConvNeXt-mini achieves 78.3% Top-1 accuracy on ImageNet with only 0.9 GFLOPs—reducing computation by ~80% over the original ConvNeXt-tiny—while E-ConvNeXt-small attains 81.9% accuracy at 3.1 GFLOPs. Moreover, E-ConvNeXt exhibits strong transferability to downstream tasks such as object detection. This work establishes a scalable architectural paradigm for lightweight, high-performance CNN design.
📝 Abstract
Many high-performance networks were not designed with lightweight application scenarios in mind from the outset, which has greatly restricted their scope of application. This paper takes ConvNeXt as the research object and significantly reduces the parameter scale and network complexity of ConvNeXt by integrating the Cross Stage Partial Connections mechanism and a series of optimized designs. The new network is named E-ConvNeXt, which can maintain high accuracy performance under different complexity configurations. The three core innovations of E-ConvNeXt are : (1) integrating the Cross Stage Partial Network (CSPNet) with ConvNeXt and adjusting the network structure, which reduces the model's network complexity by up to 80%; (2) Optimizing the Stem and Block structures to enhance the model's feature expression capability and operational efficiency; (3) Replacing Layer Scale with channel attention. Experimental validation on ImageNet classification demonstrates E-ConvNeXt's superior accuracy-efficiency balance: E-ConvNeXt-mini reaches 78.3% Top-1 accuracy at 0.9GFLOPs. E-ConvNeXt-small reaches 81.9% Top-1 accuracy at 3.1GFLOPs. Transfer learning tests on object detection tasks further confirm its generalization capability.