🤖 AI Summary
This paper addresses key bottlenecks in urban scene change detection (USCD): reliance on small-scale annotated datasets, poor generalization, and rigid change definitions. To overcome these, we propose a systematic solution: (1) We introduce AC-1M, the first million-scale street-view change detection dataset; (2) We design EMPLACE, a ViT-based self-supervised framework incorporating an adaptive triplet loss, supporting both zero-shot transfer and linear probing; (3) Our method achieves significant improvements over state-of-the-art methods across multiple benchmarks. In an empirical study on Amsterdam, EMPLACE accurately detects multi-scale urban changes; notably, detected change intensity exhibits a statistically significant negative correlation with local housing prices—revealing, for the first time, an intrinsic link between spatiotemporal change distribution and urban spatial inequality.
📝 Abstract
Urban change is a constant process that influences the perception of neighbourhoods and the lives of the people within them. The field of Urban Scene Change Detection (USCD) aims to capture changes in street scenes using computer vision and can help raise awareness of changes that make it possible to better understand the city and its residents. Traditionally, the field of USCD has used supervised methods with small scale datasets. This constrains methods when applied to new cities, as it requires labour-intensive labeling processes and forces a priori definitions of relevant change. In this paper we introduce AC-1M the largest USCD dataset by far of over 1.1M images, together with EMPLACE, a self-supervising method to train a Vision Transformer using our adaptive triplet loss. We show EMPLACE outperforms SOTA methods both as a pre-training method for linear fine-tuning as well as a zero-shot setting. Lastly, in a case study of Amsterdam, we show that we are able to detect both small and large changes throughout the city and that changes uncovered by EMPLACE, depending on size, correlate with housing prices - which in turn is indicative of inequity.