When Earth Foundation Models Meet Diffusion: An Application to Land Surface Temperature Super-Resolution

📅 2026-04-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
This study addresses the challenge of reconstructing land surface temperature (LST) at extreme spatial degradations, such as 32× downsampling, by proposing EFDiff—a novel framework that integrates the Earth foundation model Prithvi-EO-2.0 with a diffusion model. For the first time, high-resolution multispectral reflectance is leveraged via cross-attention mechanisms to provide geospatial embeddings as priors that guide thermal image super-resolution. Evaluated on a global benchmark comprising 242,416 co-registered Landsat samples, EFDiff significantly outperforms existing methods, demonstrating that cross-attention guidance surpasses conventional channel concatenation strategies by simultaneously preserving pixel-level fidelity and enhancing perceptual realism. The framework includes two variants: EFDiff-ε and EFDiff-x₀.

Technology Category

Application Category

📝 Abstract
Land surface temperature (LST) super-resolution is important for environmental monitoring. However, it remains challenging as coarse thermal observations severely underdetermine fine-scale structure. In this paper, we propose Earth Foundation Model-guided Diffusion (EFDiff), a novel framework for super-resolution under extreme spatial degradation. EFDiff uses the Prithvi-EO-2.0 Earth foundation model to encode high-resolution multispectral reflectance into geospatial embeddings, which are injected into the denoising network via cross-attention to guide fine-scale reconstruction from highly degraded observations. We study two variants, EFDiff-$ε$ and EFDiff-$x_0$, which offer complementary trade-offs between perceptual realism and pixel-level fidelity. We evaluate EFDiff under an extreme $32\times$ scale gap using a globally diverse benchmark comprising 242,416 co-registered Landsat thermal-reflectance patches. Results show that EFDiff consistently outperforms baseline methods and that cross-attention conditioning by EFM is more effective than HLS channel concatenation. Although we present EFDiff in the context of LST super-resolution, the framework is broadly applicable to remote sensing problems in which pretrained geospatial representations can guide generative reconstruction.
Problem

Research questions and friction points this paper is trying to address.

Land Surface Temperature
Super-Resolution
Remote Sensing
Spatial Degradation
Thermal Imagery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Earth Foundation Model
Diffusion Model
Super-Resolution
Cross-Attention
Land Surface Temperature
🔎 Similar Papers
No similar papers found.