3D MRI Image Pretraining via Controllable 2D Slice Navigation Task

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work proposes a novel self-supervised pretraining paradigm for 3D MRI that moves beyond treating scans as static slices or voxel sets, which neglects their inherent spatial structure and navigational dynamics. Instead, the method transforms 3D MRI volumes into controllable sequences of 2D slice renderings, constructing video–action pairs that encode positional, orientational, and scale-related actions to serve as self-supervisory signals. By integrating a slice observation encoder with an action-conditioned latent dynamics model, 3D anatomical understanding is framed as a dynamic sequence prediction task. Experiments demonstrate that this approach significantly outperforms static reconstruction baselines, encoder-only pretraining schemes, and dynamics-based variants without action alignment across multiple anatomical and spatial downstream tasks, thereby validating its effectiveness and conceptual novelty.

📝 Abstract

Self-supervised pretraining has become the mainstream approach for learning MRI representations from unlabeled scans. However, most existing objectives still treat each scan primarily as static aggregations of slices, patches or volumes. We ask whether there exists an intrinsic form of self-supervision signal that is different from reconstructing the masked patches, through transforming the 3D volumes into controllable 2D rendered sequences: by rendering slices at continuous positions, orientations, and scales, a 3D volume can be converted into dense video-action sequences whose controls are the action trajectories. We study this formulation with an action-conditioned pretraining objective, where a tokenizer encodes slice observations and a latent dynamics model predicts the evolution of latent features. Across representative anatomical and spatial downstream tasks, the proposed pretraining is evaluated against standard static-volume baselines, tokenizer-only pretraining, and dynamics variants without aligned actions. These results suggest that controllable MRI slice navigation provides a useful complementary pretraining interface for learning anatomical and spatial representations from large unlabeled MRI collections.

Problem

Research questions and friction points this paper is trying to address.

self-supervised pretraining

3D MRI

controllable slice navigation

anatomical representation

spatial representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

controllable slice navigation

self-supervised pretraining

3D MRI representation learning