๐ค AI Summary
This work addresses the challenge of recovering high-frequency anatomical details in cone-beam CT reconstruction under extremely sparse-view conditions. To this end, the authors propose a dual-path joint frequencyโspatial domain reconstruction architecture that incorporates a highly localized factorized Fourier neural operator to separately model global and local high-frequency features. Spectral channel factorization is employed to reduce model parameters while preserving spatial locality. Frequency-domain information is integrated via a cross-attention mechanism, and a radiance field decoding pipeline is adopted to achieve high-quality reconstruction. Experiments on the LUNA16 and ToothFairy datasets demonstrate that the proposed method significantly outperforms existing approaches, particularly excelling in the recovery of high-frequency anatomical structures under extremely sparse angular sampling.
๐ Abstract
Sparse-view Cone-Beam Computed Tomography reconstruction from limited X-ray projections remains a challenging problem in medical imaging due to the inherent undersampling of fine-grained anatomical details, which correspond to high-frequency components. Conventional CNN-based methods often struggle to recover these fine structures, as they are typically biased toward learning low-frequency information. To address this challenge, this paper presents DuFal (Dual-Frequency-Aware Learning), a novel framework that integrates frequency-domain and spatial-domain processing via a dual-path architecture. The core innovation lies in our High-Local Factorized Fourier Neural Operator, which comprises two complementary branches: a Global High-Frequency Enhanced Fourier Neural Operator that captures global frequency patterns and a Local High-Frequency Enhanced Fourier Neural Operator that processes spatially partitioned patches to preserve spatial locality that might be lost in global frequency analysis. To improve efficiency, we design a Spectral-Channel Factorization scheme that reduces the Fourier Neural Operator parameter count. We also design a Cross-Attention Frequency Fusion module to integrate spatial and frequency features effectively. The fused features are then decoded through a Feature Decoder to produce projection representations, which are subsequently processed through an Intensity Field Decoding pipeline to reconstruct a final Computed Tomography volume. Experimental results on the LUNA16 and ToothFairy datasets demonstrate that DuFal significantly outperforms existing state-of-the-art methods in preserving high-frequency anatomical features, particularly under extremely sparse-view settings.