🤖 AI Summary
This work addresses the substantial computational burden of applying diffusion models to high-resolution 3D medical image enhancement tasks—such as denoising and super-resolution—stemming from the immense voxel space. To overcome this limitation, the authors propose a sparse voxel diffusion framework that significantly reduces both training and inference costs through a sparsely scheduled time-step strategy. The method further incorporates a Structure-aware Trajectory Modulation (STM) module to adaptively preserve anatomical details. By integrating velocity-space supervision and temporal embedding recalibration, the approach achieves state-of-the-art performance across four large-scale medical imaging datasets encompassing CT, PET, and MRI modalities, while accelerating training by up to 10× without compromising image fidelity.
📝 Abstract
Three-dimensional (3D) medical image enhancement, including denoising and super-resolution, is critical for clinical diagnosis in CT, PET, and MRI. Although diffusion models have shown remarkable success in 2D medical imaging, scaling them to high-resolution 3D volumes remains computationally prohibitive due to lengthy diffusion trajectories over high-dimensional volumetric data. We observe that in conditional enhancement, strong anatomical priors in the degraded input render dense noise schedules largely redundant. Leveraging this insight, we propose a sparse voxel-space diffusion framework that trains and samples on a compact set of uniformly subsampled timesteps. The network predicts clean data directly on the data manifold, supervised in velocity space for stable gradient scaling. A lightweight Structure-aware Trajectory Modulation (STM) module recalibrates time embeddings at each network block based on local anatomical content, enabling structure-adaptive denoising over the shared sparse schedule. Operating directly in voxel space, our framework preserves fine anatomical detail without lossy compression while achieving up to $10\times$ training acceleration. Experiments on four datasets spanning CT, PET, and MRI demonstrate state-of-the-art performance on both denoising and super-resolution tasks. Our code is publicly available at: https://github.com/mirthAI/sparse-3d-diffusion.