Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models?

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether generative diffusion models can serve as high-discriminative geospatial foundation models (GFMs) for remote sensing tasks. To this end, we propose SatDiFuser—a novel framework introducing a noise-dependent multi-stage feature analysis mechanism, which bridges generative pretraining and discriminative downstream tasks via three noise-adaptive fusion strategies. Crucially, our approach abandons conventional fine-tuning paradigms and directly repurposes diffusion models as efficient discriminative representation learners. We systematically evaluate generative models’ discriminative capabilities across multiple remote sensing benchmarks for semantic segmentation and image classification. Experimental results demonstrate substantial improvements: +5.7% mIoU in semantic segmentation and +7.9% F1-score in image classification—significantly surpassing state-of-the-art GFMs. This study establishes the first evidence that diffusion-based pretraining, when properly adapted, yields highly effective discriminative representations for geospatial vision tasks.

Technology Category

Application Category

📝 Abstract
Self-supervised learning (SSL) has revolutionized representation learning in Remote Sensing (RS), advancing Geospatial Foundation Models (GFMs) to leverage vast unlabeled satellite imagery for diverse downstream tasks. Currently, GFMs primarily focus on discriminative objectives, such as contrastive learning or masked image modeling, owing to their proven success in learning transferable representations. However, generative diffusion models--which demonstrate the potential to capture multi-grained semantics essential for RS tasks during image generation--remain underexplored for discriminative applications. This prompts the question: can generative diffusion models also excel and serve as GFMs with sufficient discriminative power? In this work, we answer this question with SatDiFuser, a framework that transforms a diffusion-based generative geospatial foundation model into a powerful pretraining tool for discriminative RS. By systematically analyzing multi-stage, noise-dependent diffusion features, we develop three fusion strategies to effectively leverage these diverse representations. Extensive experiments on remote sensing benchmarks show that SatDiFuser outperforms state-of-the-art GFMs, achieving gains of up to +5.7% mIoU in semantic segmentation and +7.9% F1-score in classification, demonstrating the capacity of diffusion-based generative foundation models to rival or exceed discriminative GFMs. Code will be released.
Problem

Research questions and friction points this paper is trying to address.

Explores generative diffusion models for discriminative geospatial tasks.
Develops SatDiFuser to transform generative models into discriminative tools.
Demonstrates superior performance in remote sensing benchmarks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transforms generative diffusion models into discriminative tools
Develops fusion strategies for noise-dependent diffusion features
Outperforms state-of-the-art geospatial foundation models
🔎 Similar Papers