🤖 AI Summary
To address the slow inference and significant performance degradation under acceleration in diffusion-based pansharpening for remote sensing image fusion—caused by multi-step sampling—this paper proposes ResDiff, a residual-guided Markov chain diffusion framework. Its core contributions are threefold: (1) the first direct mapping of noise residuals to the intrinsic spectral-spatial residual between low-resolution multispectral (LRMS) and high-resolution multispectral (HRMS) images; (2) a latent-space enhancement encoder coupled with shallow conditional injection (SC-I) to enable efficient feature alignment; and (3) a dedicated residual loss function to optimize reconstruction fidelity. ResDiff achieves state-of-the-art performance on benchmark datasets—including WHU, GF-2, and QB—with only 15 sampling steps, accelerating inference by over 90% relative to existing diffusion-based methods while simultaneously improving accuracy and preserving spectral-spatial consistency.
📝 Abstract
The implementation of diffusion-based pansharpening task is predominantly constrained by its slow inference speed, which results from numerous sampling steps. Despite the existing techniques aiming to accelerate sampling, they often compromise performance when fusing multi-source images. To ease this limitation, we introduce a novel and efficient diffusion model named Diffusion Model for Pansharpening by Inferring Residual Inference (ResPanDiff), which significantly reduces the number of diffusion steps without sacrificing the performance to tackle pansharpening task. In ResPanDiff, we innovatively propose a Markov chain that transits from noisy residuals to the residuals between the LRMS and HRMS images, thereby reducing the number of sampling steps and enhancing performance. Additionally, we design the latent space to help model extract more features at the encoding stage, Shallow Cond-Injection~(SC-I) to help model fetch cond-injected hidden features with higher dimensions, and loss functions to give a better guidance for the residual generation task. enabling the model to achieve superior performance in residual generation. Furthermore, experimental evaluations on pansharpening datasets demonstrate that the proposed method achieves superior outcomes compared to recent state-of-the-art~(SOTA) techniques, requiring only 15 sampling steps, which reduces over $90%$ step compared with the benchmark diffusion models. Our experiments also include thorough discussions and ablation studies to underscore the effectiveness of our approach.