๐ค AI Summary
This work addresses the challenge of anatomical hallucinations and fidelity degradation in zero-shot MRI reconstruction under severely undersampled conditions, where unimodal unconditional generative priors often fail. To overcome this limitation, we propose MPFlow, a novel framework that leverages routinely acquired multimodal clinical MRI as a posterior guideโwithout retraining the generative prior. Built upon rectified flow, MPFlow jointly steers the sampling process during inference through data consistency and cross-modal feature alignment, while a self-supervised patch-level multimodal pretraining strategy (PAMRI) enables shared cross-modal representations. Experiments on the HCP and BraTS datasets demonstrate that MPFlow achieves comparable image quality to diffusion model baselines using only 20% of the sampling steps, with significantly reduced hallucinations in tumor regions, evidenced by a Dice score improvement exceeding 15%.
๐ Abstract
Zero-shot MRI reconstruction relies on generative priors, but single-modality unconditional priors produce hallucinations under severe ill-posedness. In many clinical workflows, complementary MRI acquisitions (e.g. high-quality structural scans) are routinely available, yet existing reconstruction methods lack mechanisms to leverage this additional information. We propose MPFlow, a zero-shot multi-modal reconstruction framework built on rectified flow that incorporates auxiliary MRI modalities at inference time without retraining the generative prior to improve anatomical fidelity. Cross-modal guidance is enabled by our proposed self-supervised pretraining strategy, Patch-level Multi-modal MR Image Pretraining (PAMRI), which learns shared representations across modalities. Sampling is jointly guided by data consistency and cross-modal feature alignment using pre-trained PAMRI, systematically suppressing intrinsic and extrinsic hallucinations. Extensive experiments on HCP and BraTS show that MPFlow matches diffusion baselines on image quality using only 20% of sampling steps while reducing tumor hallucinations by more than 15% (segmentation dice score). This demonstrates that cross-modal guidance enables more reliable and efficient zero-shot MRI reconstruction.