🤖 AI Summary
This work addresses the high computational complexity of frequency-domain methods in pansharpening, their limited ability to exploit the regional sparsity of remote sensing imagery, and the insufficient adaptability of existing spatial enhancement strategies. To overcome these limitations, we propose the Region-Aware Fusion Network (RAFNet), which jointly models spatial and frequency information by innovatively integrating discrete wavelet transform with K-means clustering to construct dynamic region-adaptive convolutional kernels. Furthermore, RAFNet incorporates a semantic clustering-guided sparse attention mechanism to effectively reduce computational redundancy. Extensive experiments on multiple benchmark datasets demonstrate that RAFNet significantly outperforms state-of-the-art methods in both reduced-resolution and full-resolution evaluations, achieving high-quality reconstruction of high-resolution multispectral images.
📝 Abstract
Pansharpening aims to generate high-resolution multispectral (HRMS) images by fusing low-resolution multispectral (LRMS) and high-resolution panchromatic (PAN) images. Although deep learning has advanced this field, mainstream frequency-based methods relying on standard scaled dot-product attention suffer from quadratic computational complexity and fail to exploit the inherent regional sparsity of remote sensing imagery. Furthermore, existing spatial enhancement strategies typically employ static convolution kernels, which struggle to adapt to the complex frequency and regional variations of PAN and MS images. To address these bottlenecks, we propose a Region-Aware Fusion (RAFNet) Network that synergistically models spatial and frequency information. Specifically, we design a Spatial Adaptive Refinement (SAR) module that leverages the discrete wavelet transform (DWT) for directional frequency separation and K-means clustering for regional partitioning, which enables the dynamic construction of region-specific adaptive convolution kernels, achieving spatially and frequency-adaptive feature enhancement. Moreover, we introduce a Clustered Frequency Aggregation (CFA) module based on a sparse attention mechanism guided by the semantic clusters, which executes a region-aware sparse attention strategy that drastically reduces computational redundancy while ensuring high-quality frequency feature extraction. In addition we integrated these modules into a progressive, multi-level spatial-frequency network architecture to facilitate robust interaction and accurate image reconstruction. Extensive experiments on multiple benchmark datasets demonstrate that the proposed RAFNet significantly outperforms state-of-the-art pansharpening methods in both reduced- and full-resolution assessments. The code is available at https://github.com/PatrickNod/RAFNet.