🤖 AI Summary
Traditional convolution in remote sensing image pansharpening suffers from limited multi-scale feature extraction due to fixed square receptive fields and preset sampling densities. To address this, we propose the Adaptive Rectangular Convolution (ARConv) module and an end-to-end network, ARNet. ARConv introduces learnable rectangular kernel dimensions and dynamic sampling point counts, integrated with deformable sampling to enable joint spatial and scale adaptivity. Experiments on multiple benchmark remote sensing datasets demonstrate that ARNet significantly outperforms existing state-of-the-art methods in both quantitative metrics and visual quality. Ablation studies confirm the effectiveness of each design component, while visualization analyses reveal strong physical interpretability—e.g., learned kernel shapes align with object orientations and scales in the scene. Overall, ARNet advances pansharpening by enabling data-driven, geometry-aware feature aggregation without relying on handcrafted priors or rigid architectural constraints.
📝 Abstract
Recent advancements in convolutional neural network (CNN)-based techniques for remote sensing pansharpening have markedly enhanced image quality. However, conventional convolutional modules in these methods have two critical drawbacks. First, the sampling positions in convolution operations are confined to a fixed square window. Second, the number of sampling points is preset and remains unchanged. Given the diverse object sizes in remote sensing images, these rigid parameters lead to suboptimal feature extraction. To overcome these limitations, we introduce an innovative convolutional module, Adaptive Rectangular Convolution (ARConv). ARConv adaptively learns both the height and width of the convolutional kernel and dynamically adjusts the number of sampling points based on the learned scale. This approach enables ARConv to effectively capture scale-specific features of various objects within an image, optimizing kernel sizes and sampling locations. Additionally, we propose ARNet, a network architecture in which ARConv is the primary convolutional module. Extensive evaluations across multiple datasets reveal the superiority of our method in enhancing pansharpening performance over previous techniques. Ablation studies and visualization further confirm the efficacy of ARConv.