Cross-Scale Pansharpening via ScaleFormer and the PanScale Benchmark

๐Ÿ“… 2026-02-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing pan-sharpening methods are predominantly evaluated at low resolutions and struggle to generalize to real-world high-resolution cross-scale scenarios. To address this limitation, this work introduces the PanScale dataset and the PanScale-Bench benchmark, along with ScaleFormerโ€”the first general-purpose framework specifically designed for cross-scale pan-sharpening. ScaleFormer incorporates a Scale-Aware Patchify module and rotary positional encoding to parse images into variable-length patch sequences, enabling effective extrapolation to unseen scales. Extensive experiments demonstrate that ScaleFormer consistently outperforms state-of-the-art methods in both fusion quality and cross-scale generalization capability.

Technology Category

Application Category

๐Ÿ“ Abstract
Pansharpening aims to generate high-resolution multi-spectral images by fusing the spatial detail of panchromatic images with the spectral richness of low-resolution MS data. However, most existing methods are evaluated under limited, low-resolution settings, limiting their generalization to real-world, high-resolution scenarios. To bridge this gap, we systematically investigate the data, algorithmic, and computational challenges of cross-scale pansharpening. We first introduce PanScale, the first large-scale, cross-scale pansharpening dataset, accompanied by PanScale-Bench, a comprehensive benchmark for evaluating generalization across varying resolutions and scales. To realize scale generalization, we propose ScaleFormer, a novel architecture designed for multi-scale pansharpening. ScaleFormer reframes generalization across image resolutions as generalization across sequence lengths: it tokenizes images into patch sequences of the same resolution but variable length proportional to image scale. A Scale-Aware Patchify module enables training for such variations from fixed-size crops. ScaleFormer then decouples intra-patch spatial feature learning from inter-patch sequential dependency modeling, incorporating Rotary Positional Encoding to enhance extrapolation to unseen scales. Extensive experiments show that our approach outperforms SOTA methods in fusion quality and cross-scale generalization. The datasets and source code are available upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

pansharpening
cross-scale
generalization
high-resolution
benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

ScaleFormer
PanScale
cross-scale pansharpening
scale generalization
Rotary Positional Encoding