🤖 AI Summary
Existing approaches require separate, task-specific models for multimodal image reconstruction and synthesis, leading to complex pipelines and limited generalization. This work proposes Any2all, a unified framework that, for the first time, formulates both tasks as a virtual image inpainting problem. By training a single unconditional diffusion model on complete multimodal data, the framework enables flexible inference with arbitrary input combinations to generate target modality images. Without any task-specific customization, Any2all achieves high-fidelity reconstruction and synthesis simultaneously on PET/MR/CT brain datasets, significantly outperforming specialized models and attaining state-of-the-art performance in both distortion metrics and perceptual quality.
📝 Abstract
Image reconstruction and image synthesis are important for handling incomplete multimodal imaging data, but existing methods require various task-specific models, complicating training and deployment workflows. We introduce Any2all, a unified framework that addresses this limitation by formulating these disparate tasks as a single virtual inpainting problem. We train a single, unconditional diffusion model on the complete multimodal data stack. This model is then adapted at inference time to ``inpaint''all target modalities from any combination of inputs of available clean images or noisy measurements. We validated Any2all on a PET/MR/CT brain dataset. Our results show that Any2all can achieve excellent performance on both multimodal reconstruction and synthesis tasks, consistently yielding images with competitive distortion-based performance and superior perceptual quality over specialized methods.