RADA: Region-Aware Dual-encoder Auxiliary learning for Barely-supervised Medical Image Segmentation

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the challenges of low-quality pseudo-labels and insufficient semantic understanding in weakly supervised medical image segmentation under extremely sparse annotations. To this end, the authors propose a region-aware dual-encoder auxiliary learning framework, which, for the first time, integrates the Alpha-CLIP pretrained dual-encoder into weakly supervised medical image segmentation. By leveraging text–image alignment for semantic guidance and employing a tri-view training strategy, the method achieves fine-grained image-level feature representation aligned with pixel-level predictions. Extensive experiments on the LA2018, KiTS19, and LiTS datasets demonstrate that the proposed approach significantly outperforms existing methods, effectively enhancing segmentation accuracy and generalization capability under extremely sparse annotation settings.

Technology Category

Application Category

📝 Abstract

Deep learning has greatly advanced medical image segmentation, but its success relies heavily on fully supervised learning, which requires dense annotations that are costly and time-consuming for 3D volumetric scans. Barely-supervised learning reduces annotation burden by using only a few labeled slices per volume. Existing methods typically propagate sparse annotations to unlabeled slices through geometric continuity to generate pseudo-labels, but this strategy lacks semantic understanding, often resulting in low-quality pseudo-labels. Furthermore, medical image segmentation is inherently a pixel-level visual understanding task, where accuracy fundamentally depends on the quality of local, fine-grained visual features. Inspired by this, we propose RADA, a novel Region-Aware Dual-encoder Auxiliary learning pipeline which introduces a dual-encoder framework pre-trained on Alpha-CLIP to extract fine-grained, region-specific visual features from the original images and limited annotations. The framework combines image-level fine-grained visual features with text-level semantic guidance, providing region-aware semantic supervision that bridges image-level semantics and pixel-level segmentation. Integrated into a triple-view training framework, RADA achieves SOTA performance under extremely sparse annotation settings on LA2018, KiTS19 and LiTS, demonstrating robust generalization across diverse datasets.

Problem

Research questions and friction points this paper is trying to address.

barely-supervised learning

medical image segmentation

pseudo-labeling

sparse annotation

semantic understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Region-Aware

Dual-encoder

Alpha-CLIP