VoxCor: Training-Free Volumetric Features for Multimodal Voxel Correspondence

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the challenge of obtaining anatomically consistent and reusable voxel-level representations in cross-modal 3D medical image analysis, where existing methods rely on single-view feature extraction and pairwise registration. The authors propose VoxCor, a training-free fit-and-transform framework that leverages a frozen 2D Vision Transformer for triplanar inference and constructs transferable voxel features across modalities and subjects via weighted partial least squares (WPLS) projection. A key innovation is the introduction of an anatomical orientation selection mechanism to enhance feature stability, enabling, for the first time, high-consistency voxel correspondence without fine-tuning or explicit registration. Experiments demonstrate that VoxCor significantly outperforms current approaches on abdominal MR–CT and HCP T2w–T1w tasks, achieving registration performance comparable to both handcrafted and learning-based 3D features while reducing sensitivity to encoder choice.

📝 Abstract

Cross-modal 3D medical image analysis requires voxelwise representations that remain anatomically consistent across imaging contrasts, scanners, and acquisition protocols. Recent work has shown that frozen 2D Vision Transformer (ViT) foundation models can support such representations, but typical pipelines extract features along a single anatomical axis and adapt those features inside a registration solver for one image pair at a time, leaving complementary viewing directions unused and producing representations that do not transfer to new volumes. We introduce VoxCor, a training-free fit--transform method for reusable volumetric feature representations from frozen 2D ViT foundation models. During an offline fitting phase, VoxCor combines triplanar ViT inference with a compact closed-form weighted partial least squares (WPLS) projection that uses fitting-time voxel correspondences to select modality-stable anatomical directions in the triplanar feature space. At transform time, new volumes are mapped by triplanar ViT inference and linear projection alone, without fine-tuning or registration. Voxel correspondences can then be queried directly by nearest-neighbor search. We evaluate VoxCor on intra-subject Abdomen MR--CT and inter-subject HCP T2w--T1w tasks using deformable registration, voxelwise k-nearest-neighbor segmentation, and segmentation-center landmark localization. VoxCor improves the hardest cross-subject, cross-modality transfer settings, reduces encoder sensitivity for dense correspondence transfer, and yields registration performance competitive with handcrafted descriptors and learned 3D features. This positions VoxCor as a reusable feature layer for downstream multimodal analysis beyond pairwise registration. Code, configuration files, and implementation details are publicly available on GitHub at \href{https://github.com/guneytombak/VoxCor}{guneytombak/VoxCor}.

Problem

Research questions and friction points this paper is trying to address.

multimodal

voxel correspondence

3D medical image analysis

anatomical consistency

cross-modality

Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free

volumetric features

multimodal correspondence