Medverse: A Universal Model for Full-Resolution 3D Medical Image Segmentation, Transformation and Enhancement

📅 2025-09-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing context learning (ICL) models for medical imaging struggle to simultaneously achieve high-fidelity predictions and holistic anatomical understanding, while lacking unified modeling capabilities across tasks, anatomical regions, and multi-center clinical sites. Method: We propose the first general-purpose 3D medical image analysis model, built upon a novel autoregressive ICL framework and block-wise cross-attention mechanism to enable multi-scale anatomical awareness and long-range dependency modeling. Leveraging progressive autoregressive modeling and spatial sparsification, the model jointly optimizes segmentation, registration, and enhancement across 22 diverse datasets, generating full-resolution 3D volumetric outputs. Results: Extensive experiments demonstrate significant superiority over state-of-the-art methods on unseen clinical centers, organs, species, and imaging modalities—achieving exceptional zero-shot generalization. This work establishes a new paradigm for foundational models in medical imaging.

Technology Category

Application Category

📝 Abstract
In-context learning (ICL) offers a promising paradigm for universal medical image analysis, enabling models to perform diverse image processing tasks without retraining. However, current ICL models for medical imaging remain limited in two critical aspects: they cannot simultaneously achieve high-fidelity predictions and global anatomical understanding, and there is no unified model trained across diverse medical imaging tasks (e.g., segmentation and enhancement) and anatomical regions. As a result, the full potential of ICL in medical imaging remains underexplored. Thus, we present extbf{Medverse}, a universal ICL model for 3D medical imaging, trained on 22 datasets covering diverse tasks in universal image segmentation, transformation, and enhancement across multiple organs, imaging modalities, and clinical centers. Medverse employs a next-scale autoregressive in-context learning framework that progressively refines predictions from coarse to fine, generating consistent, full-resolution volumetric outputs and enabling multi-scale anatomical awareness. We further propose a blockwise cross-attention module that facilitates long-range interactions between context and target inputs while preserving computational efficiency through spatial sparsity. Medverse is extensively evaluated on a broad collection of held-out datasets covering previously unseen clinical centers, organs, species, and imaging modalities. Results demonstrate that Medverse substantially outperforms existing ICL baselines and establishes a novel paradigm for in-context learning. Code and model weights will be made publicly available. Our model are publicly available at https://github.com/jiesihu/Medverse.
Problem

Research questions and friction points this paper is trying to address.

Achieving high-fidelity predictions with global anatomical understanding simultaneously
Lack of unified model across diverse medical imaging tasks and anatomical regions
Underexplored potential of in-context learning in 3D medical imaging applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Next-scale autoregressive in-context learning framework
Blockwise cross-attention module for long-range interactions
Trained across 22 datasets for universal medical imaging
🔎 Similar Papers
No similar papers found.