GLIMPSE: Generalized Local Imaging with MLPs

πŸ“… 2024-01-01
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the poor generalization of global CNNs and the prohibitive memory/computational overhead scaling with image resolution in sparse-angle CT reconstruction, this paper proposes a localized MLP architecture that performs pointwise reconstruction using only projection data corresponding to the local neighborhood of each target pixel. We introduce the first fully differentiable local perception paradigm, eliminating reliance on large receptive fields. The method enables end-to-end geometric calibration for uncalibrated projection angles and achieves near-resolution-independent memory consumption. On 1024Γ—1024 images, a single training iteration requires only ~5 GB GPU memory and sub-second runtime. Experiments demonstrate significantly superior out-of-distribution (OOD) generalization compared to U-Net, while maintaining or exceeding in-distribution (ID) performance.

Technology Category

Application Category

πŸ“ Abstract
Deep learning is the current de facto state of the art in tomographic imaging. A common approach is to feed the result of a simple inversion, for example the backprojection, to a convolutional neural network (CNN) which then computes the reconstruction. Despite strong results on 'in-distribution' test data similar to the training data, backprojection from sparse-view data delocalizes singularities, so these approaches require a large receptive field to perform well. As a consequence, they overfit to certain global structures which leads to poor generalization on out-of-distribution (OOD) samples. Moreover, their memory complexity and training time scale unfavorably with image resolution, making them impractical for application at realistic clinical resolutions, especially in 3D: a standard U-Net requires a substantial 140GB of memory and 2600 seconds per epoch on a research-grade GPU when training on 1024x1024 images. In this paper, we introduce GLIMPSE, a local processing neural network for computed tomography which reconstructs a pixel value by feeding only the measurements associated with the neighborhood of the pixel to a simple MLP. While achieving comparable or better performance with successful CNNs like the U-Net on in-distribution test data, GLIMPSE significantly outperforms them on OOD samples while maintaining a memory footprint almost independent of image resolution; 5GB memory suffices to train on 1024x1024 images. Further, we built GLIMPSE to be fully differentiable, which enables feats such as recovery of accurate projection angles if they are out of calibration.
Problem

Research questions and friction points this paper is trying to address.

Overfitting large-scale structures in CT imaging
Poor generalization on out-of-distribution samples
High memory complexity in multiscale CNNs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Local coordinate-based neural network for CT
Processes neighborhood measurements per pixel
Memory-efficient training on high-resolution images
πŸ”Ž Similar Papers
No similar papers found.