🤖 AI Summary
This work addresses the challenges of detecting small objects in low-quality remote sensing imagery, where complex backgrounds, weak signals, and limited spatial resolution hinder performance. Existing approaches that sequentially apply super-resolution followed by detection suffer from misaligned optimization objectives and redundant feature representations. To overcome these limitations, the authors propose SDCoNet, a unified framework that implicitly couples super-resolution and detection through a shared Swin Transformer encoder. A multi-scale saliency prediction module is introduced to emphasize weak target regions while suppressing background clutter, and a gradient routing strategy is designed to alleviate optimization conflicts in multi-task learning. Extensive experiments on benchmark datasets—including NWPU VHR-10-Split, DOTAv1.5-Split, and HRSSD-Split—demonstrate that SDCoNet significantly outperforms state-of-the-art methods, achieving substantial gains in small object detection accuracy without compromising computational efficiency.
📝 Abstract
In remote sensing images, complex backgrounds, weak object signals, and small object scales make accurate detection particularly challenging, especially under low-quality imaging conditions. A common strategy is to integrate single-image super-resolution (SR) before detection; however, such serial pipelines often suffer from misaligned optimization objectives, feature redundancy, and a lack of effective interaction between SR and detection. To address these issues, we propose a Saliency-Driven multi-task Collaborative Network (SDCoNet) that couples SR and detection through implicit feature sharing while preserving task specificity. SDCoNet employs the swin transformer-based shared encoder, where hierarchical window-shifted self-attention supports cross-task feature collaboration and adaptively balances the trade-off between texture refinement and semantic representation. In addition, a multi-scale saliency prediction module produces importance scores to select key tokens, enabling focused attention on weak object regions, suppression of background clutter, and suppression of adverse features introduced by multi-task coupling. Furthermore, a gradient routing strategy is introduced to mitigate optimization conflicts. It first stabilizes detection semantics and subsequently routes SR gradients along a detection-oriented direction, enabling the framework to guide the SR branch to generate high-frequency details that are explicitly beneficial for detection. Experiments on public datasets, including NWPU VHR-10-Split, DOTAv1.5-Split, and HRSSD-Split, demonstrate that the proposed method, while maintaining competitive computational efficiency, significantly outperforms existing mainstream algorithms in small object detection on low-quality remote sensing images. Our code is available at https://github.com/qiruo-ya/SDCoNet.