🤖 AI Summary
This paper addresses the generic anatomical localization problem in multimodal medical imaging (CT/MRI), proposing a GPS-inspired anatomical localization paradigm: mapping arbitrary image coordinates to a standardized atlas space to uniformly support downstream tasks including matching, registration, classification, and segmentation. Methodologically, we design a lightweight regression network that employs sparse input sampling and a modality-agnostic architecture for end-to-end coordinate estimation. Key contributions include: (1) the first cross-modal anatomical coordinate regression framework; (2) zero-shot cross-modal generalization without modality-specific fine-tuning; and (3) sub-millisecond single-point localization latency, enabling real-time inference on commodity CPUs. Experiments demonstrate high accuracy and strong generalization across CT and MRI datasets, establishing a foundational localization capability for multimodal medical image analysis.
📝 Abstract
We introduce a new type of foundational model for parsing human anatomy in medical images that works for different modalities. It supports supervised or unsupervised training and can perform matching, registration, classification, or segmentation with or without user interaction. We achieve this by training a neural network estimator that maps query locations to atlas coordinates via regression. Efficiency is improved by sparsely sampling the input, enabling response times of less than 1 ms without additional accelerator hardware. We demonstrate the utility of the algorithm in both CT and MRI modalities.