🤖 AI Summary
To address the scarcity of multi-modal annotations in medical image segmentation, this paper proposes a partially supervised and unpaired multi-modal collaborative learning framework. The method decouples modality alignment from semantic segmentation, integrating contrastive cross-modal representation learning, adversarial unpaired image translation, semi-supervised consistency regularization, and uncertainty-aware pseudo-labeling. This design significantly reduces reliance on fully paired, pixel-level annotations. Evaluated on BraTS and MMWHS benchmarks, the framework achieves 92% of the Dice score attained by fully supervised models using only 10% labeled data—substantially outperforming state-of-the-art unpaired methods. The approach establishes a new paradigm for high-accuracy, multi-modal segmentation under low annotation budgets, balancing representational alignment and task-specific learning without requiring strict inter-modality correspondence.