Beyond Visual Fidelity: Benchmarking Super-Resolution Models for Large-Scale Remote Sensing Imagery via Downstream Task Integration

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Current super-resolution evaluation relies heavily on fidelity metrics such as PSNR and SSIM, which often fail to reflect the practical utility of reconstructed images in downstream remote sensing tasks. To address this limitation, this work proposes GeoSR-Bench—the first benchmark that directly integrates Earth observation tasks, including land cover classification and infrastructure mapping, into the super-resolution evaluation pipeline. We systematically assess the task-oriented performance of diverse super-resolution approaches—spanning GANs, Transformers, neural operators, and diffusion models—across 270 experimental configurations. Our findings reveal that improvements in conventional fidelity metrics frequently exhibit weak or even negative correlation with downstream task performance, underscoring the necessity and superiority of a task-centric evaluation paradigm for super-resolution in geospatial applications.

📝 Abstract

Super-resolution (SR) techniques have made major advances in reconstructing high-resolution images from low-resolution inputs. The increased resolution provides visual enhancement and utility for monitoring tasks. In particular, SR has been increasingly developed for satellite-based Earth observation, with applications in urban planning, agriculture, ecology, and disaster response. However, existing SR studies and benchmarks typically use fidelity metrics such as PSNR or SSIM, whereas the true utility of super-resolved images lies in supporting downstream tasks such as land cover classification, biomass estimation, and change detection. To bridge this gap, we introduce GeoSR-Bench, a downstream task-integrated SR benchmark dataset to evaluate SR models beyond fidelity metrics. GeoSR-Bench comprises spatially co-located, temporally aligned, and quality-controlled image pairs from about 36,000 locations across diverse land covers, spanning resolutions from 500m to 0.6m. To the best of our knowledge, GeoSR-Bench is the first SR benchmark that directly connects improved image resolution from SR models with downstream Earth monitoring tasks, including land cover segmentation, infrastructure mapping, and biophysical variable estimation. Using GeoSR-Bench, we benchmark GAN, transformer, neural operator, and diffusion-based SR models on perceptual quality and downstream task performance. We conduct experiments with 270 settings, covering 2 cross-platform SR tasks, 9 SR models, 3 downstream task models, and 5 downstream tasks for each SR task. The results show that improvements in traditional SR metrics often do not correlate with gains in task performance, and the correlations can be negative, indicating that these metrics provide limited guidance for selecting superior models for downstream tasks. This reveals the need to integrate downstream tasks into SR model development and evaluation.

Problem

Research questions and friction points this paper is trying to address.

super-resolution

remote sensing

downstream tasks

benchmarking

fidelity metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

super-resolution

downstream task integration

remote sensing