Universal Pansharpening Foundation Model

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes FoundPS, the first satellite-agnostic and scene-robust foundation model for pan-sharpening, addressing the limited generalization of existing methods that are often tailored to specific satellites and scenes. FoundPS introduces several key innovations: a modality-interleaved Transformer that maps multi-spectral images with arbitrary spectral bands into a unified latent space, a reversible spectral affine basis to preserve spectral structure, a latent diffusion bridge combined with bridge posterior sampling to enhance fusion stability, and an infinite-dimensional pixel-latent interaction mechanism to improve fine-detail reconstruction. Evaluated on PSBench—a newly curated large-scale benchmark—FoundPS significantly outperforms state-of-the-art approaches and demonstrates exceptional generalization and robustness across diverse sensors and scenes.

Technology Category

Application Category

📝 Abstract
Pansharpening generates the high-resolution multi-spectral (MS) image by integrating spatial details from a texture-rich panchromatic (PAN) image and spectral attributes from a low-resolution MS image. Existing methods are predominantly satellite-specific and scene-dependent, which severely limits their generalization across heterogeneous sensors and varied scenes, thereby reducing their real-world practicality. To address these challenges, we present FoundPS, a universal pansharpening foundation model for satellite-agnostic and scene-robust fusion. Specifically, we introduce a modality-interleaved transformer that learns band-wise modal specializations to form reversible spectral affine bases, mapping arbitrary-band MS into a unified latent space via tensor multiplication. Building upon this, we construct a latent diffusion bridge model to progressively evolve latent representations, and incorporate bridge posterior sampling to couple latent diffusion with pixel-space observations, enabling stable and controllable fusion. Furthermore, we devise infinite-dimensional pixel-to-latent interaction mechanisms to comprehensively capture the cross-domain dependencies between PAN observations and MS representations, thereby facilitating complementary information fusion. In addition, to support large-scale training and evaluation, we construct a comprehensive pansharpening benchmark, termed PSBench, consisting of worldwide MS and PAN image pairs from multiple satellites across diverse scenes. Extensive experiments demonstrate that FoundPS consistently outperforms state-of-the-art methods, exhibiting superior generalization and robustness across a wide range of pansharpening tasks.
Problem

Research questions and friction points this paper is trying to address.

pansharpening
generalization
satellite-agnostic
scene-robust
cross-sensor
Innovation

Methods, ideas, or system contributions that make the work stand out.

universal pansharpening
modality-interleaved transformer
latent diffusion bridge
pixel-to-latent interaction
foundation model
🔎 Similar Papers
No similar papers found.
H
Hebaixu Wang
School of Electronic Information, Wuhan University, Wuhan, China
J
Jing Zhang
School of Computer Science, Wuhan University, Wuhan, China
Haonan Guo
Haonan Guo
LIESMARS, Wuhan University
Di Wang
Di Wang
School of Computer Science, Wuhan University
Remote SensingDeep LearningComputer VisionHyperspectral Image Clasification
Jiayi Ma
Jiayi Ma
Wuhan University
Computer VisionImage FusionImage Matching
Bo Du
Bo Du
Department of Management, Griffith Business School
Sustainable TransportTravel BehaviourUrban Data AnalyticsLogistics and Supply Chain
L
Liangpei Zhang
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China