TCSA-UDA: Text-Driven Cross-Semantic Alignment for Unsupervised Domain Adaptation in Medical Image Segmentation

๐Ÿ“… 2025-11-08
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In medical image segmentation, significant domain shifts between modalities (e.g., CT and MRI) severely hinder unsupervised domain adaptation (UDA). To address this, we propose a text-driven cross-semantic alignment frameworkโ€”the first to incorporate linguistic semantics into medical image domain adaptation. Our method leverages a pre-trained vision-language model to construct modality-invariant joint representations; introduces a vision-language covariance cosine loss to enforce distributional consistency across modalities; and designs a semantic-prototype-based class-level alignment module to achieve pixel-wise semantic-consistent feature mapping. Evaluated on multi-center datasets for cardiac, abdominal, and brain tumor segmentation, our approach substantially mitigates domain shift and consistently outperforms existing state-of-the-art UDA methods in segmentation accuracy. This work establishes a novel paradigm of language-guided unsupervised domain adaptation for medical imaging.

Technology Category

Application Category

๐Ÿ“ Abstract
Unsupervised domain adaptation for medical image segmentation remains a significant challenge due to substantial domain shifts across imaging modalities, such as CT and MRI. While recent vision-language representation learning methods have shown promise, their potential in UDA segmentation tasks remains underexplored. To address this gap, we propose TCSA-UDA, a Text-driven Cross-Semantic Alignment framework that leverages domain-invariant textual class descriptions to guide visual representation learning. Our approach introduces a vision-language covariance cosine loss to directly align image encoder features with inter-class textual semantic relations, encouraging semantically meaningful and modality-invariant feature representations. Additionally, we incorporate a prototype alignment module that aligns class-wise pixel-level feature distributions across domains using high-level semantic prototypes. This mitigates residual category-level discrepancies and enhances cross-modal consistency. Extensive experiments on challenging cross-modality cardiac, abdominal, and brain tumor segmentation benchmarks demonstrate that our TCSA-UDA framework significantly reduces domain shift and consistently outperforms state-of-the-art UDA methods, establishing a new paradigm for integrating language-driven semantics into domain-adaptive medical image analysis.
Problem

Research questions and friction points this paper is trying to address.

Addressing domain shifts across medical imaging modalities like CT and MRI
Aligning visual features with textual semantics for modality-invariant representations
Reducing category-level discrepancies in unsupervised medical image segmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages domain-invariant textual class descriptions
Uses vision-language covariance cosine loss for alignment
Aligns pixel-level feature distributions with semantic prototypes
๐Ÿ”Ž Similar Papers
No similar papers found.
L
Lalit Maurya
School of Computing, University of Portsmouth, Portsmouth, PO1 3HE, UK
Honghai Liu
Honghai Liu
Portsmouth University
Human-Machine SystemsMulti-Sensory Data Fusion and Information AnalyticsBio-MechatronicsPattern RecognitionIntelligent Robotics
R
R. Zwiggelaar
Department of Computer Science, Aberystwyth University, Aberystwyth, SY23 3DB, UK