🤖 AI Summary
Existing urban land-use mapping methods suffer from low accuracy in complex urban areas, primarily due to the limited ground-level detail in remote sensing data and the scarcity of labeled training samples and poor cross-city generalizability of supervised street-view classification.
Method: This paper proposes an unsupervised street-view contrastive clustering framework integrated with geographic priors. It models spatial continuity via Tobler’s First Law, learns discriminative street-view image representations through contrastive learning, and incorporates spatial constraints to guide clustering—eliminating reliance on manual annotations.
Contribution/Results: The framework enables zero-shot label mapping, customizable land-use categories, and dynamic map updates. Evaluated on two cities, it achieves high spatial consistency, strong cross-city generalization, and practical utility for urban planning—overcoming the dual bottlenecks of ground-level detail deficiency and annotation dependency.
📝 Abstract
Urban land use classification and mapping are critical for urban planning, resource management, and environmental monitoring. Existing remote sensing techniques often lack precision in complex urban environments due to the absence of ground-level details. Unlike aerial perspectives, street view images provide a ground-level view that captures more human and social activities relevant to land use in complex urban scenes. Existing street view-based methods primarily rely on supervised classification, which is challenged by the scarcity of high-quality labeled data and the difficulty of generalizing across diverse urban landscapes. This study introduces an unsupervised contrastive clustering model for street view images with a built-in geographical prior, to enhance clustering performance. When combined with a simple visual assignment of the clusters, our approach offers a flexible and customizable solution to land use mapping, tailored to the specific needs of urban planners. We experimentally show that our method can generate land use maps from geotagged street view image datasets of two cities. As our methodology relies on the universal spatial coherence of geospatial data ("Tobler's law"), it can be adapted to various settings where street view images are available, to enable scalable, unsupervised land use mapping and updating. The code will be available at https://github.com/lin102/CCGP.