๐ค AI Summary
Existing pan-sharpening methods are predominantly evaluated at low resolutions and struggle to generalize to real-world high-resolution cross-scale scenarios. To address this limitation, this work introduces the PanScale dataset and the PanScale-Bench benchmark, along with ScaleFormerโthe first general-purpose framework specifically designed for cross-scale pan-sharpening. ScaleFormer incorporates a Scale-Aware Patchify module and rotary positional encoding to parse images into variable-length patch sequences, enabling effective extrapolation to unseen scales. Extensive experiments demonstrate that ScaleFormer consistently outperforms state-of-the-art methods in both fusion quality and cross-scale generalization capability.
๐ Abstract
Pansharpening aims to generate high-resolution multi-spectral images by fusing the spatial detail of panchromatic images with the spectral richness of low-resolution MS data. However, most existing methods are evaluated under limited, low-resolution settings, limiting their generalization to real-world, high-resolution scenarios. To bridge this gap, we systematically investigate the data, algorithmic, and computational challenges of cross-scale pansharpening. We first introduce PanScale, the first large-scale, cross-scale pansharpening dataset, accompanied by PanScale-Bench, a comprehensive benchmark for evaluating generalization across varying resolutions and scales. To realize scale generalization, we propose ScaleFormer, a novel architecture designed for multi-scale pansharpening. ScaleFormer reframes generalization across image resolutions as generalization across sequence lengths: it tokenizes images into patch sequences of the same resolution but variable length proportional to image scale. A Scale-Aware Patchify module enables training for such variations from fixed-size crops. ScaleFormer then decouples intra-patch spatial feature learning from inter-patch sequential dependency modeling, incorporating Rotary Positional Encoding to enhance extrapolation to unseen scales. Extensive experiments show that our approach outperforms SOTA methods in fusion quality and cross-scale generalization. The datasets and source code are available upon acceptance.