🤖 AI Summary
To address the trade-off between slow update frequency of high-resolution (HR) remote sensing imagery and insufficient spatial resolution of low-resolution (LR) imagery for smallholder farm boundary delineation, this paper proposes a segmentation-aware reference-based super-resolution framework. Departing from the conventional two-stage paradigm—super-resolution followed by segmentation—the method bypasses pixel-level super-resolution entirely, instead embedding super-resolution directly into the latent space of the segmentation task. This enables end-to-end generation of high-fidelity segmentation maps at an unprecedented 20× upsampling factor. The framework integrates multi-source satellite data—including multispectral imagery and geospatial priors—and synergistically combines conditional latent diffusion models with large-scale geospatial foundation models to achieve high-frequency, high-accuracy agricultural land monitoring. Evaluated on two large-scale real-world datasets, our approach surpasses state-of-the-art methods by 25.5% in instance-level and 12.9% in semantic-level segmentation metrics.
📝 Abstract
Delineating farm boundaries through segmentation of satellite images is a fundamental step in many agricultural applications. The task is particularly challenging for smallholder farms, where accurate delineation requires the use of high resolution (HR) imagery which are available only at low revisit frequencies (e.g., annually). To support more frequent (sub-) seasonal monitoring, HR images could be combined as references (ref) with low resolution (LR) images -- having higher revisit frequency (e.g., weekly) -- using reference-based super-resolution (Ref-SR) methods. However, current Ref-SR methods optimize perceptual quality and smooth over crucial features needed for downstream tasks, and are unable to meet the large scale-factor requirements for this task. Further, previous two-step approaches of SR followed by segmentation do not effectively utilize diverse satellite sources as inputs. We address these problems through a new approach, $ extbf{SEED-SR}$, which uses a combination of conditional latent diffusion models and large-scale multi-spectral, multi-source geo-spatial foundation models. Our key innovation is to bypass the explicit SR task in the pixel space and instead perform SR in a segmentation-aware latent space. This unique approach enables us to generate segmentation maps at an unprecedented 20$ imes$ scale factor, and rigorous experiments on two large, real datasets demonstrate up to $ extbf{25.5}$ and $ extbf{12.9}$ relative improvement in instance and semantic segmentation metrics respectively over approaches based on state-of-the-art Ref-SR methods.