Consistent View Alignment Improves Foundation Models for 3D Medical Image Segmentation

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

In self-supervised representation learning for 3D medical image segmentation, existing methods lack a structured latent space, as implicit learning from uncorrelated views fails to inherently yield semantically consistent representations required by downstream tasks. Method: We propose the first explicit cross-view feature alignment mechanism to enforce consistency among multi-view representations in the latent space, thereby guiding structured representation learning and mitigating false-positive interference. Our framework integrates a Primus ViT backbone with a ResEnc CNN encoder and jointly optimizes contrastive and reconstruction objectives to achieve alignment. Contribution/Results: Our method achieved first and second place in the MICCAI 2025 SSL3D Challenge. The pre-trained models and code are publicly released. Extensive experiments on multiple 3D segmentation benchmarks demonstrate significant performance gains over state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract

Many recent approaches in representation learning implicitly assume that uncorrelated views of a data point are sufficient to learn meaningful representations for various downstream tasks. In this work, we challenge this assumption and demonstrate that meaningful structure in the latent space does not emerge naturally. Instead, it must be explicitly induced. We propose a method that aligns representations from different views of the data to align complementary information without inducing false positives. Our experiments show that our proposed self-supervised learning method, Consistent View Alignment, improves performance for downstream tasks, highlighting the critical role of structured view alignment in learning effective representations. Our method achieved first and second place in the MICCAI 2025 SSL3D challenge when using a Primus vision transformer and ResEnc convolutional neural network, respectively. The code and pretrained model weights are released at https://github.com/Tenbatsu24/LatentCampus.

Problem

Research questions and friction points this paper is trying to address.

Improves 3D medical image segmentation via view alignment

Challenges assumption uncorrelated views suffice representation learning

Aligns complementary information without inducing false positives

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns representations from different data views

Uses self-supervised learning for view alignment

Improves 3D medical image segmentation performance

🔎 Similar Papers

No similar papers found.