SFR-Net: Learning Scale-Frustum Representations for Ultra-Wide Area Remote Sensing Image Segmentation

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of simultaneously capturing significant scale variations of geographic objects and preserving long-range semantic continuity in ultra-large-scale remote sensing imagery. To this end, the authors propose a frustum-inspired scale-frustum representation and design a cascaded cross-scale fusion mechanism to jointly model multi-scale objects and contextual features within a unified framework. This work is the first to introduce the scale-frustum representation into remote sensing segmentation, effectively enhancing local semantic understanding while maintaining global structural consistency. Experimental results demonstrate that the proposed framework achieves state-of-the-art performance, improving mIoU by 1.72% and 4.29% on the GID and FBPS datasets, respectively, while also accelerating convergence and boosting the accuracy of general-purpose segmentation models.
📝 Abstract
Pixel count and geographical coverage are two key characteristics of remote sensing images. Existing remote sensing image segmentation methods typically focus on images with either a small pixel count or a large pixel count but limited geographical coverage. In this paper, we introduce a novel segmentation task targeting ultra-wide area (UWA) remote sensing images, characterized by both a large pixel count and extremely wide geographical coverage. The core challenges of UWA segmentation lie in simultaneously handling ground objects with significantly varying scales and maintaining long-range contextual semantic continuity. To address these challenges, we propose the Scale-Frustum Representation Network (SFR-Net). Inspired by the viewing frustums of remote sensing images captured from different altitudes, we construct scale-frustum representations, enabling unified modeling of ground objects and contextual features at different scales. Furthermore, we design a cascaded cross-scale fusion mechanism to effectively integrate these representations, enhancing local semantic understanding while ensuring long-range contextual continuity. Experimental results on GID and FBPS demonstrate that SFR-Net achieves state-of-the-art performance, improving mIoU by 1.72% and 4.29%, respectively, over the strongest competing methods. In addition, the proposed scale-frustum representations can be integrated into generic segmentation networks to improve both segmentation accuracy and convergence speed. The implementation code will be publicly available at https://github.com/ChuyuZhong/SFR-Net.
Problem

Research questions and friction points this paper is trying to address.

ultra-wide area
remote sensing image segmentation
scale variation
long-range contextual continuity
geographical coverage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scale-Frustum Representation
Ultra-Wide Area Remote Sensing
Cross-Scale Fusion
Semantic Segmentation
Multi-Scale Modeling
🔎 Similar Papers
2024-03-18International Journal of Computer VisionCitations: 48