🤖 AI Summary
ADMM-FFT tomographic reconstruction suffers from high computational cost and substantial memory overhead. To address this, we propose memoized low-rank reconstruction (mLR), the first approach to integrate memoization into the ADMM-FFT iterative framework by caching repeated FFT computation results. mLR further incorporates cross-GPU variable offloading, hierarchical CPU memory management, and multi-node, multi-GPU parallelization to significantly improve memory efficiency and scalability. The method supports reconstructions up to 2K×2K×2K voxels and operates efficiently under memory constraints. Experimental evaluation demonstrates that mLR achieves an average speedup of 52.8% over baseline ADMM-FFT, with a maximum acceleration of 65.4%, while preserving reconstruction accuracy and exhibiting strong scalability across diverse hardware configurations.
📝 Abstract
ADMM-FFT is an iterative method with high reconstruction accuracy for laminography but suffers from excessive computation time and large memory consumption. We introduce mLR, which employs memoization to replace the time-consuming Fast Fourier Transform (FFT) operations based on an unique observation that similar FFT operations appear in iterations of ADMM-FFT. We introduce a series of techniques to make the application of memoization to ADMM-FFT performance-beneficial and scalable. We also introduce variable offloading to save CPU memory and scale ADMM-FFT across GPUs within and across nodes. Using mLR, we are able to scale ADMM-FFT on an input problem of 2Kx2Kx2K, which is the largest input problem laminography reconstruction has ever worked on with the ADMM-FFT solution on limited memory; mLR brings 52.8% performance improvement on average (up to 65.4%), compared to the original ADMM-FFT.