🤖 AI Summary
Remote sensing building segmentation models suffer from limited cross-regional generalization due to urban structural heterogeneity and scarcity of high-quality annotated data. To address this, we propose a test-time adaptive framework that leverages OpenStreetMap road network data to procedurally generate geographically representative, photorealistic synthetic remote sensing imagery via physics-based rendering, augmented with domain randomization for enhanced diversity. Within an adversarial domain adaptation architecture, the synthetic data is seamlessly integrated to jointly optimize semantic segmentation and domain alignment—mitigating performance collapse associated with purely synthetic training. Experiments demonstrate up to a 12-percentage-point median IoU improvement in cross-domain settings. The method is computationally efficient, highly scalable, and exhibits strong practical deployability.
📝 Abstract
Deep learning has significantly advanced building segmentation in remote sensing, yet models struggle to generalize on data of diverse geographic regions due to variations in city layouts and the distribution of building types, sizes and locations. However, the amount of time-consuming annotated data for capturing worldwide diversity may never catch up with the demands of increasingly data-hungry models. Thus, we propose a novel approach: re-training models at test time using synthetic data tailored to the target region's city layout. This method generates geo-typical synthetic data that closely replicates the urban structure of a target area by leveraging geospatial data such as street network from OpenStreetMap. Using procedural modeling and physics-based rendering, very high-resolution synthetic images are created, incorporating domain randomization in building shapes, materials, and environmental illumination. This enables the generation of virtually unlimited training samples that maintain the essential characteristics of the target environment. To overcome synthetic-to-real domain gaps, our approach integrates geo-typical data into an adversarial domain adaptation framework for building segmentation. Experiments demonstrate significant performance enhancements, with median improvements of up to 12%, depending on the domain gap. This scalable and cost-effective method blends partial geographic knowledge with synthetic imagery, providing a promising solution to the "model collapse" issue in purely synthetic datasets. It offers a practical pathway to improving generalization in remote sensing building segmentation without extensive real-world annotations.