Universal Pansharpening Foundation Model

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
This work proposes FoundPS, the first satellite-agnostic and scene-robust foundation model for pan-sharpening, addressing the limited generalization of existing methods that are often tailored to specific satellites and scenes. FoundPS introduces several key innovations: a modality-interleaved Transformer that maps multi-spectral images with arbitrary spectral bands into a unified latent space, a reversible spectral affine basis to preserve spectral structure, a latent diffusion bridge combined with bridge posterior sampling to enhance fusion stability, and an infinite-dimensional pixel-latent interaction mechanism to improve fine-detail reconstruction. Evaluated on PSBench—a newly curated large-scale benchmark—FoundPS significantly outperforms state-of-the-art approaches and demonstrates exceptional generalization and robustness across diverse sensors and scenes.

Technology Category

Application Category

📝 Abstract
Pansharpening generates the high-resolution multi-spectral (MS) image by integrating spatial details from a texture-rich panchromatic (PAN) image and spectral attributes from a low-resolution MS image. Existing methods are predominantly satellite-specific and scene-dependent, which severely limits their generalization across heterogeneous sensors and varied scenes, thereby reducing their real-world practicality. To address these challenges, we present FoundPS, a universal pansharpening foundation model for satellite-agnostic and scene-robust fusion. Specifically, we introduce a modality-interleaved transformer that learns band-wise modal specializations to form reversible spectral affine bases, mapping arbitrary-band MS into a unified latent space via tensor multiplication. Building upon this, we construct a latent diffusion bridge model to progressively evolve latent representations, and incorporate bridge posterior sampling to couple latent diffusion with pixel-space observations, enabling stable and controllable fusion. Furthermore, we devise infinite-dimensional pixel-to-latent interaction mechanisms to comprehensively capture the cross-domain dependencies between PAN observations and MS representations, thereby facilitating complementary information fusion. In addition, to support large-scale training and evaluation, we construct a comprehensive pansharpening benchmark, termed PSBench, consisting of worldwide MS and PAN image pairs from multiple satellites across diverse scenes. Extensive experiments demonstrate that FoundPS consistently outperforms state-of-the-art methods, exhibiting superior generalization and robustness across a wide range of pansharpening tasks.
Problem

Research questions and friction points this paper is trying to address.

pansharpening
generalization
satellite-agnostic
scene-robust
cross-sensor
Innovation

Methods, ideas, or system contributions that make the work stand out.

universal pansharpening
modality-interleaved transformer
latent diffusion bridge
pixel-to-latent interaction
foundation model