๐ค AI Summary
To address the challenging end-to-end reconstruction of 3D CT volumes from sparse-view 2D X-ray images, this paper proposes a dual-view collaborative guidance framework based on latent diffusion. Methodologically, we design a view-parameter-guided feature encoder to align features between X-ray and CT spaces; introduce a novel view synthesis module to generate auxiliary X-ray projections, forming a hybrid conditioning input comprising both real and synthetic views; and jointly optimize the diffusion process via feature-level alignment and CT latent representation decoding. Our key contributions are the first introduction of a dual-view collaborative guidance paradigm and a view-parameter-driven cross-modal feature alignment strategy. Extensive experiments on multiple benchmarks demonstrate significant improvements over state-of-the-art methods in both structural fidelity and perceptual quality, validating the effectiveness and robustness of our approach under sparse-view settings.
๐ Abstract
Directly reconstructing 3D CT volume from few-view 2D X-rays using an end-to-end deep learning network is a challenging task, as X-ray images are merely projection views of the 3D CT volume. In this work, we facilitate complex 2D X-ray image to 3D CT mapping by incorporating new view synthesis, and reduce the learning difficulty through view-guided feature alignment. Specifically, we propose a dual-view guided diffusion model (DVG-Diffusion), which couples a real input X-ray view and a synthesized new X-ray view to jointly guide CT reconstruction. First, a novel view parameter-guided encoder captures features from X-rays that are spatially aligned with CT. Next, we concatenate the extracted dual-view features as conditions for the latent diffusion model to learn and refine the CT latent representation. Finally, the CT latent representation is decoded into a CT volume in pixel space. By incorporating view parameter guided encoding and dual-view guided CT reconstruction, our DVG-Diffusion can achieve an effective balance between high fidelity and perceptual quality for CT reconstruction. Experimental results demonstrate our method outperforms state-of-the-art methods. Based on experiments, the comprehensive analysis and discussions for views and reconstruction are also presented.