π€ AI Summary
Medical image segmentation suffers from scarce annotated data, while existing contrastive learning methods are largely confined to image-level representations and lack effective encoder-decoder co-training. To address this, we propose MACL, a Multi-level Asymmetric Contrastive Learning frameworkβthe first to jointly pre-train encoder and decoder by integrating feature-level, image-level, and pixel-level representations. Key innovations include: (1) a multi-level joint contrastive loss; (2) an asymmetric dual-branch network architecture; (3) voxel-wise positive/negative sample construction and cross-scale feature alignment; and (4) seamless compatibility with U-Net-based backbones. MACL outperforms 11 state-of-the-art contrastive methods across eight medical imaging datasets. With only 10% labeled data, it achieves Dice score improvements of 1.72β7.87% on four benchmarks (e.g., ACDC). Moreover, it consistently delivers SOTA performance when integrated into five distinct U-Net variants, demonstrating strong generalization and architectural flexibility.
π Abstract
Medical image segmentation is a fundamental yet challenging task due to the arduous process of acquiring large volumes of high-quality labeled data from experts. Contrastive learning offers a promising but still problematic solution to this dilemma. Firstly existing medical contrastive learning strategies focus on extracting image-level representation, which ignores abundant multi-level representations. Furthermore they underutilize the decoder either by random initialization or separate pre-training from the encoder, thereby neglecting the potential collaboration between the encoder and decoder. To address these issues, we propose a novel multi-level asymmetric contrastive learning framework named MACL for volumetric medical image segmentation pre-training. Specifically, we design an asymmetric contrastive learning structure to pre-train encoder and decoder simultaneously to provide better initialization for segmentation models. Moreover, we develop a multi-level contrastive learning strategy that integrates correspondences across feature-level, image-level, and pixel-level representations to ensure the encoder and decoder capture comprehensive details from representations of varying scales and granularities during the pre-training phase. Finally, experiments on 8 medical image datasets indicate our MACL framework outperforms existing 11 contrastive learning strategies. i.e. Our MACL achieves a superior performance with more precise predictions from visualization figures and 1.72%, 7.87%, 2.49% and 1.48% Dice higher than previous best results on ACDC, MMWHS, HVSMR and CHAOS with 10% labeled data, respectively. And our MACL also has a strong generalization ability among 5 variant U-Net backbones. Our code will be released at https://github.com/stevezs315/MACL.