🤖 AI Summary
Existing 3D medical image synthesis methods suffer from low anatomical fidelity, limited axial coverage, and high computational overhead, hindering deployment in resource-constrained clinical settings. To address these challenges, we propose a lightweight 3D CT generation framework based on a 2D multimodal conditional diffusion model. Our approach models consecutive 2D slices as video frames and jointly encodes anatomical priors via semantic segmentation maps, radiological text reports, and optical flow constraints to ensure both structural accuracy and inter-slice temporal consistency. During inference, an overlapping-frame strategy enables arbitrary-length 3D volume reconstruction. Experiments demonstrate that our method reduces GPU memory consumption by ~70% and inference time by ~50% compared to 3D diffusion baselines, while achieving superior anatomical accuracy (Dice score +8.2%) and sequence coherence (optical flow error −32%). The framework further supports privacy-preserving inference and practical clinical deployment.
📝 Abstract
3D medical image generation is essential for data augmentation and patient privacy, calling for reliable and efficient models suited for clinical practice. However, current methods suffer from limited anatomical fidelity, restricted axial length, and substantial computational cost, placing them beyond reach for regions with limited resources and infrastructure. We introduce TRACE, a framework that generates 3D medical images with spatiotemporal alignment using a 2D multimodal-conditioned diffusion approach. TRACE models sequential 2D slices as video frame pairs, combining segmentation priors and radiology reports for anatomical alignment, incorporating optical flow to sustain temporal coherence. During inference, an overlapping-frame strategy links frame pairs into a flexible length sequence, reconstructed into a spatiotemporally and anatomically aligned 3D volume. Experimental results demonstrate that TRACE effectively balances computational efficiency with preserving anatomical fidelity and spatiotemporal consistency. Code is available at: https://github.com/VinyehShaw/TRACE.