🤖 AI Summary
This work addresses the limitations of existing frequency-domain remote sensing image fusion methods, which rely on fixed filters and exhibit inadequate utilization of frequency information in their denoising strategies, thereby struggling to adapt to complex spectral distributions. To overcome these challenges, the authors propose CGFformer, a novel framework that introduces a K-means clustering-guided adaptive frequency separation mechanism. It further incorporates a dual-stream Transformer with cross-attention modules to jointly perform denoising and detail enhancement in both frequency and spatial domains. A dedicated frequency-spatial fusion mechanism is then employed to improve reconstruction quality. Extensive experiments on multiple remote sensing datasets demonstrate that the proposed method significantly outperforms state-of-the-art approaches, effectively preserving spectral fidelity while enhancing spatial details.
📝 Abstract
Pansharpening aims to generate high-resolution multispectral (HRMS) images by fusing low-resolution multispectral (LRMS) images with high-resolution panchromatic (PAN) images. However, the current mainstream frequency-based pansharpening methods employ fixed frequency filters, which cannot precisely adapt to complex and spatially diversified frequency distributions in PAN and MS images. Furthermore, existing denoising strategies insufficiently exploit frequency components for denoising and struggle to suppress various noise types accurately. To address these challenges, we propose CGFformer, a cluster-guidance frequency Transformer that focuses on varying frequency distribution and interactions between frequency and spatial components. Specifically, we design an adaptive separation module that integrates local features and non-local information through K-means clustering, enabling more precise separation of high- and low-frequency components. Subsequently, we introduce a dual-stream refinement module combined with Transformer-based cross-attention to remove various noise, allowing the network to jointly suppress frequency-relevant and irrelevant disturbances. In addition, we develop a frequency-spatial fusion module designed to enhance detail and facilitate spatial-frequency interaction, ensuring more effective reconstruction of spatial structures in the fused results. Extensive experiments on multiple benchmark datasets demonstrate that the proposed CGFformer achieves notable improvements over existing pansharpening approaches.