Training and Inference within 1 Second -- Tackle Cross-Sensor Degradation of Real-World Pansharpening with Efficient Residual Feature Tailoring

📅 2025-08-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor cross-sensor generalization of deep learning-based pansharpening models, this paper proposes a feature-level adaptation method that requires no additional training data. Our approach decouples the fusion process modularly, incorporates physics-aware unsupervised losses, customizes residual features, and enables block-wise parallel inference—augmented by a plug-and-play feature cropping module that efficiently refines feature-level fusion. The method achieves end-to-end training and inference in under one second (0.2 s on an RTX 3090), accelerating zero-shot baselines by over two orders of magnitude while substantially reducing computational cost. Evaluated on diverse real-world remote sensing datasets, it attains state-of-the-art performance, demonstrating strong robustness and efficiency for 512×512×8 multispectral images. Overall, our solution provides a lightweight, plug-and-play framework for cross-sensor pansharpening.

Technology Category

Application Category

📝 Abstract
Deep learning methods for pansharpening have advanced rapidly, yet models pretrained on data from a specific sensor often generalize poorly to data from other sensors. Existing methods to tackle such cross-sensor degradation include retraining model or zero-shot methods, but they are highly time-consuming or even need extra training data. To address these challenges, our method first performs modular decomposition on deep learning-based pansharpening models, revealing a general yet critical interface where high-dimensional fused features begin mapping to the channel space of the final image. % may need revisement A Feature Tailor is then integrated at this interface to address cross-sensor degradation at the feature level, and is trained efficiently with physics-aware unsupervised losses. Moreover, our method operates in a patch-wise manner, training on partial patches and performing parallel inference on all patches to boost efficiency. Our method offers two key advantages: (1) $ extit{Improved Generalization Ability}$: it significantly enhance performance in cross-sensor cases. (2) $ extit{Low Generalization Cost}$: it achieves sub-second training and inference, requiring only partial test inputs and no external data, whereas prior methods often take minutes or even hours. Experiments on the real-world data from multiple datasets demonstrate that our method achieves state-of-the-art quality and efficiency in tackling cross-sensor degradation. For example, training and inference of $512 imes512 imes8$ image within $ extit{0.2 seconds}$ and $4000 imes4000 imes8$ image within $ extit{3 seconds}$ at the fastest setting on a commonly used RTX 3090 GPU, which is over 100 times faster than zero-shot methods.
Problem

Research questions and friction points this paper is trying to address.

Address cross-sensor degradation in pansharpening models
Reduce time-consuming retraining and data dependency
Achieve fast training and inference with high efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular decomposition of pansharpening models
Feature Tailor for cross-sensor degradation
Patch-wise training and parallel inference
🔎 Similar Papers
No similar papers found.
T
Tianyu Xin
University of Electronic Science and Technology of China
Jin-Liang Xiao
Jin-Liang Xiao
University of Electronic Science and Technology of China
image fusion
Z
Zeyu Xia
University of Electronic Science and Technology of China
S
Shan Yin
University of Electronic Science and Technology of China
L
Liang-Jian Deng
University of Electronic Science and Technology of China