Structure-aware Semantic Discrepancy and Consistency for 3D Medical Image Self-supervised Learning

📅 2025-07-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current 3D medical self-supervised learning (mSSL) methods predominantly employ fixed-size patch partitioning, overlooking the inherent heterogeneity of anatomical structures in spatial location, scale, and morphology—leading to coarse-grained and insufficiently discriminative semantic representations. To address this, we propose a structure-aware joint learning framework that simultaneously optimizes semantic disparity and consistency. Our approach introduces, for the first time, structure-level semantic consistency constraints and inter-patch semantic disparity optimization. We model cross-regional semantic discrimination via optimal transport and enhance intra-structural semantic consistency by leveraging neighborhood similarity distributions. Furthermore, we enforce alignment between patch-level and structure-level representations. Extensive evaluation across 10 datasets, 4 downstream tasks, and 3 medical imaging modalities demonstrates consistent and significant improvements over state-of-the-art methods, with enhanced generalizability and robustness.

Technology Category

Application Category

📝 Abstract
3D medical image self-supervised learning (mSSL) holds great promise for medical analysis. Effectively supporting broader applications requires considering anatomical structure variations in location, scale, and morphology, which are crucial for capturing meaningful distinctions. However, previous mSSL methods partition images with fixed-size patches, often ignoring the structure variations. In this work, we introduce a novel perspective on 3D medical images with the goal of learning structure-aware representations. We assume that patches within the same structure share the same semantics (semantic consistency) while those from different structures exhibit distinct semantics (semantic discrepancy). Based on this assumption, we propose an mSSL framework named $S^2DC$, achieving Structure-aware Semantic Discrepancy and Consistency in two steps. First, $S^2DC$ enforces distinct representations for different patches to increase semantic discrepancy by leveraging an optimal transport strategy. Second, $S^2DC$ advances semantic consistency at the structural level based on neighborhood similarity distribution. By bridging patch-level and structure-level representations, $S^2DC$ achieves structure-aware representations. Thoroughly evaluated across 10 datasets, 4 tasks, and 3 modalities, our proposed method consistently outperforms the state-of-the-art methods in mSSL.
Problem

Research questions and friction points this paper is trying to address.

Address anatomical structure variations in 3D medical images
Improve semantic discrepancy and consistency in self-supervised learning
Bridge patch-level and structure-level representations effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structure-aware semantic discrepancy and consistency
Optimal transport for distinct patch representations
Neighborhood similarity for structural consistency
🔎 Similar Papers
No similar papers found.
Tan Pan
Tan Pan
Fudan University
Computer VisionAI4ScienceSelf-supervised Learning
Zhaorui Tan
Zhaorui Tan
University of Liverpool, PHD student
GeneralizationText-to-ImageGenerative models
K
Kaiyu Guo
The University of Queensland; Shanghai Academy of Artificial Intelligence for Science
Dongli Xu
Dongli Xu
KU Leuven
Computer Vision
Weidi Xu
Weidi Xu
Infly Technology
C
Chen Jiang
Shanghai Academy of Artificial Intelligence for Science
X
Xin Guo
Shanghai Academy of Artificial Intelligence for Science
Y
Yuan Qi
Artificial Intelligence Innovation and Incubation Institute, Fudan University; Shanghai Academy of Artificial Intelligence for Science; Zhongshan Hospital, Fudan University
Y
Yuan Cheng
Artificial Intelligence Innovation and Incubation Institute, Fudan University; Shanghai Academy of Artificial Intelligence for Science