🤖 AI Summary
Existing deep learning–based pansharpening methods for remote sensing imagery suffer from poor generalization, while zero-shot approaches often struggle with low fusion quality, high computational cost, and slow convergence. This work proposes FMG-Pan, a novel framework that introduces an instance-adaptive mechanism combining pretrained model guidance with physical fidelity constraints. It employs a lightweight adaptive network to rapidly perform joint optimization on a single image, incorporating spectral and physical fidelity terms to simultaneously preserve spatial details and spectral consistency. Evaluated on real-world datasets such as WorldView-3, the method achieves state-of-the-art performance, requiring only 3 seconds for training and inference of a 512×512×8 image on an RTX 3090 GPU. FMG-Pan significantly outperforms existing zero-shot methods while demonstrating strong cross-sensor generalization and computational efficiency.
📝 Abstract
Pansharpening aims to generate high-resolution multispectral (HRMS) images by fusing low-resolution multispectral (LRMS) and high-resolution panchromatic (PAN) images while preserving both spectral and spatial information. Although deep learning (DL)-based pansharpening methods achieve impressive performance, they require high training cost and large datasets, and often degrade when the test distribution differs from training, limiting generalization. Recent zero-shot methods, trained on a single PAN/LRMS pair, offer strong generalization but suffer from limited fusion quality, high computational overhead, and slow convergence. To address these issues, we propose FMG-Pan, a fast and generalizable model-guided instance-wise adaptation framework for real-world pansharpening, achieving both cross-sensor generality and rapid training-inference. The framework leverages a pretrained model to guide a lightweight adaptive network through joint optimization with spectral and physical fidelity constraints. We further design a novel physical fidelity term to enhance spatial detail preservation. Extensive experiments on real-world datasets under both intra- and cross-sensor settings demonstrate state-of-the-art performance. On the WorldView-3 dataset, FMG-Pan completes training and inference for a 512x512x8 image within 3 seconds on an RTX 3090 GPU, significantly faster than existing zero-shot methods, making it suitable for practical deployment.