🤖 AI Summary
In medical image segmentation, boundary pixels and sparse-class regions are highly susceptible to imbalanced contextual features, leading to misclassification. To address this, we propose the Dual-Feature Equilibrium Network (DFEN), the first framework to introduce a dual-path feature equilibrium mechanism operating at both image-level and class-level. The image-level path leverages Swin Transformer to model long-range dependencies, while the class-level path incorporates intra-class feature aggregation and context equilibrium modules to jointly enhance pixel-wise representation robustness. Furthermore, DFEN adopts Swin Transformer uniformly across both encoder and decoder stages, effectively balancing global semantic understanding with local detail preservation. Extensive experiments demonstrate state-of-the-art performance on four benchmark medical datasets—BUSI, ISIC2017, ACDC, and PH². The source code is publicly available.
📝 Abstract
Current methods for medical image segmentation primarily focus on extracting contextual feature information from the perspective of the whole image. While these methods have shown effective performance, none of them take into account the fact that pixels at the boundary and regions with a low number of class pixels capture more contextual feature information from other classes, leading to misclassification of pixels by unequal contextual feature information. In this paper, we propose a dual feature equalization network based on the hybrid architecture of Swin Transformer and Convolutional Neural Network, aiming to augment the pixel feature representations by image-level equalization feature information and class-level equalization feature information. Firstly, the image-level feature equalization module is designed to equalize the contextual information of pixels within the image. Secondly, we aggregate regions of the same class to equalize the pixel feature representations of the corresponding class by class-level feature equalization module. Finally, the pixel feature representations are enhanced by learning weights for image-level equalization feature information and class-level equalization feature information. In addition, Swin Transformer is utilized as both the encoder and decoder, thereby bolstering the ability of the model to capture long-range dependencies and spatial correlations. We conducted extensive experiments on Breast Ultrasound Images (BUSI), International Skin Imaging Collaboration (ISIC2017), Automated Cardiac Diagnosis Challenge (ACDC) and PH$^2$ datasets. The experimental results demonstrate that our method have achieved state-of-the-art performance. Our code is publicly available at https://github.com/JianJianYin/DFEN.