🤖 AI Summary
This work addresses the challenge of efficient high-resolution region selection under limited observation budgets to enhance remote sensing understanding. It formulates cross-scale remote sensing interpretation as a unified cost-aware optimization problem and proposes a joint framework that integrates fine-grained importance sampling with cross-patch contextual modeling. To support this research, the authors introduce GL-10M, the first large-scale, multi-resolution remote sensing benchmark dataset with tens of millions of aligned image pairs. Experimental results demonstrate that the proposed method significantly outperforms existing approaches in both recognition and retrieval tasks, achieving superior performance at the same observational cost and effectively balancing accuracy and computational overhead.
📝 Abstract
Remote sensing understanding inherently requires multi-resolution observation, since different targets and application tasks demand different levels of spatial detail. While low-resolution (LR) imagery enables efficient global observation, high-resolution (HR) imagery provides critical local details at much higher acquisition cost and limited coverage. This motivates a cross-scale sensing strategy that selectively acquires HR imagery from LR-based global perception to improve task performance under constrained cost. Existing methods for HR sampling methods typically make selection decisions from isolated LR patches, which ignore fine-grained intra-patch importance and cross-patch contextual interactions, leading to fragmented feature representation and suboptimal scene reasoning under sparse HR observations. To address this issue, we formulate cross-scale remote sensing understanding as a unified cost-aware problem that couples fine-grained HR sampling with cross-patch representation prediction, enabling more effective task reasoning with fewer HR observations. Furthermore, we present GL-10M, a large-scale benchmark of 10 million spatially aligned multi-resolution images, enabling systematic evaluation of budget-constrained cross-scale reasoning in remote sensing. Extensive experiments on recognition and retrieval tasks show that our method consistently achieves a superior performance-cost trade-off.