CITYREP: A Unified Benchmark for Urban Representations Across Cities, Tasks, and Modalities

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing evaluations of urban representations are limited by a small number of cities and tasks, and random data splits often induce spatial leakage, hindering cross-regional generalization and fair model comparison. This work proposes CityRep—the first generalization-aware unified benchmark for urban representation learning—featuring a spatial-unit-agnostic evaluation framework and a block-based structured partitioning protocol. It establishes a multimodal, extensible suite spanning eight cities and eight diverse tasks. Through standardized alignment modules, block-level splitting strategies, and a reproducible evaluation pipeline, the study systematically assesses 11 representative models, revealing that random splits substantially overestimate performance and distort model rankings. The results underscore the necessity and effectiveness of CityRep in enabling rigorous cross-domain comparisons and aligning heterogeneous urban representations.
📝 Abstract
Urban representation learning encodes complex urban environments into general-purpose embeddings for diverse downstream tasks and emerging urban foundation models. However, current evaluations are limited, typically focusing on one or two cities and tasks and relying on random splits that introduce spatial leakage, leading to inflated performance and weak support for cross-location generalization and fair comparison. To address this, we propose CityRep, a unified benchmark that evaluates urban representations across data modalities, cities, and tasks using spatially structured splits. CityRep consists of three key components: (1) a spatial unit-agnostic evaluation framework that supports heterogeneous urban representations through a standardized alignment module; (2) a unified evaluation protocol using block-based spatial splits to mitigate spatial leakage and enable rigorous model comparison; and (3) an extensible multi-city, multi-task benchmark suite spanning 8 cities and 8 tasks across regression, classification, and distribution prediction. We evaluate 11 representative urban representation models. Results show that performance is highly sensitive to the split protocol, with random splits inflating scores and altering model rankings. We also observe substantial variability across cities and tasks, underscoring the need for generalization-aware evaluation. CityRep is released as a reproducible benchmark with datasets, evaluation pipelines, and diagnostic tools to facilitate fair comparison and support future research in urban representation learning towards urban foundation models.
Problem

Research questions and friction points this paper is trying to address.

urban representation learning
spatial leakage
cross-city generalization
benchmark evaluation
fair model comparison
Innovation

Methods, ideas, or system contributions that make the work stand out.

urban representation learning
spatial leakage
cross-city generalization
unified benchmark
spatially structured splits
J
Junyuan Liu
SpaceTimeLab, University College London, UK
Xinglei Wang
Xinglei Wang
PhD Student, University College London
GIScienceHuman mobilityUrban analyticsSpatio-temporal data mining
Z
Zichao Zeng
SpaceTimeLab, University College London, UK; 3DIMPact, University College London, UK
J
Jiazhuang Feng
SpaceTimeLab, University College London, UK
Q
Quan Qin
SpaceTimeLab, University College London, UK; School of Resource and Environmental Sciences, Wuhan University, China
I
Ilya Ilyankou
SpaceTimeLab, University College London, UK
G
Guangsheng Dong
SpaceTimeLab, University College London, UK; State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, China
Tao Cheng
Tao Cheng
Professor in GeoInformatics, University College London
Geographical Information ScienceSpace-Time AnalyticsSmart CitiesGeoComputationNetwork Complexity