π€ AI Summary
This work addresses the limitations of existing diffusion models for pansharpening, which operate in pixel space, require sensor-specific training, and suffer from high latency and sensor dependency. The authors propose the first sensor-agnostic latent-space diffusion framework, leveraging a single-channel variational autoencoder to compress multispectral images and integrating physical spectral priors with a bidirectional interaction control architecture to enable efficient and accurate pansharpening in the latent space. A novel lightweight cross-spectral attention mechanism is introduced to significantly enhance spectral fidelity and inference efficiency. Experiments demonstrate that the method outperforms current diffusion-based approaches on GaoFen-2, QuickBird, and WorldView-3 datasets, achieving 2β3Γ faster inference and, for the first time, enabling zero-shot cross-sensor pansharpening.
π Abstract
Recently, diffusion models bring novel insights for Pan-sharpening and notably boost fusion precision. However, most existing models perform diffusion in the pixel space and train distinct models for different multispectral (MS) imagery, suffering from high latency and sensor-specific limitations. In this paper, we present SALAD-Pan, a sensor-agnostic latent space diffusion method for efficient pansharpening. Specifically, SALAD-Pan trains a band-wise single-channel VAE to encode high-resolution multispectral (HRMS) into compact latent representations, supporting MS images with various channel counts and establishing a basis for acceleration. Then spectral physical properties, along with PAN and MS images, are injected into the diffusion backbone through unidirectional and bidirectional interactive control structures respectively, achieving high-precision fusion in the diffusion process. Finally, a lightweight cross-spectral attention module is added to the central layer of diffusion model, reinforcing spectral connections to boost spectral consistency and further elevate fusion precision. Experimental results on GaoFen-2, QuickBird, and WorldView-3 demonstrate that SALAD-Pan outperforms state-of-the-art diffusion-based methods across all three datasets, attains a 2-3x inference speedup, and exhibits robust zero-shot (cross-sensor) capability.