OSM+: Billion-Level Open Street Map Data Processing System for City-wide Experiments

๐Ÿ“… 2025-12-07
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenges of massive scale, computational complexity, and inconsistent evaluation protocols in global road network data, this paper introduces OSM+, the first open-source billion-node global road graph dataset. Leveraging a 5,000-core cloud cluster, we implement distributed cleaning, fusion, and multimodal spatiotemporal alignment of OpenStreetMap data. Methodologically, we propose a scalable road graph construction framework that integrates graph neural networks with spatial database techniques, enabling efficient geospatial querying and foundation model training. Our contributions are threefold: (1) releasing a new traffic forecasting benchmark covering 31 cities and a large-scale traffic control dataset for six megacities; (2) enabling algorithm validation at the thousand-intersection scale, achieving breakthroughs in multi-agent coordination and system scalability; and (3) substantially expanding experimental scale and evaluation comprehensiveness for urban computing tasksโ€”including traffic prediction, boundary detection, and policy simulation.

Technology Category

Application Category

๐Ÿ“ Abstract
Road network data can provide rich information about cities and thus become the base for various urban research. However, processing large volume world-wide road network data requires intensive computing resources and the processed results might be different to be unified for testing downstream tasks. Therefore, in this paper, we process the OpenStreetMap data via a distributed computing of 5,000 cores on cloud services and release a structured world-wide 1-billion-vertex road network graph dataset with high accessibility (opensource and downloadable to the whole world) and usability (open-box graph structure and easy spatial query interface). To demonstrate how this dataset can be utilized easily, we present three illustrative use cases, including traffic prediction, city boundary detection and traffic policy control, and conduct extensive experiments for these three tasks. (1) For the well-investigated traffic prediction tasks, we release a new benchmark with 31 cities (traffic data processed and combined with our released OSM+ road network dataset), to provide much larger spatial coverage and more comprehensive evaluation of compared algorithms than the previously frequently-used datasets. This new benchmark will push the algorithms on their scalability from hundreds of road network intersections to thousands of intersections. (2) While for the more advanced traffic policy control task which requires interaction with the road network, we release a new 6 city datasets with much larger scale than the previous datasets. This brings new challenge for thousand-scale multi-agent coordination. (3) Along with the OSM+ dataset, the release of data converters facilitates the integration of multimodal spatial-temporal data for geospatial foundation model training, thereby expediting the process of uncovering compelling scientific insights. PVLDB Reference Forma
Problem

Research questions and friction points this paper is trying to address.

Processes billion-level OpenStreetMap data for city-wide experiments
Releases a structured global road network dataset for accessibility
Provides benchmarks for traffic prediction and policy control tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed cloud computing with 5000 cores
Structured billion-vertex global road network dataset
Open-source converters for multimodal data integration
๐Ÿ”Ž Similar Papers
No similar papers found.
Guanjie Zheng
Guanjie Zheng
Shanghai Jiao Tong University
Data miningmachine learning
Z
Ziyang Su
Shanghai Jiao Tong University
Y
Yiheng Wang
Shanghai Jiao Tong University
Y
Yuhang Luo
Shanghai Jiao Tong University
H
Hongwei Zhang
Alibaba Inc.
Xuanhe Zhou
Xuanhe Zhou
Assistant Professor, Shanghai Jiao Tong University
Data ManagementArtificial Intelligence
Linghe Kong
Linghe Kong
Shanghai Jiao Tong University
Internet of ThingsMobile computingBig data
F
Fan Wu
Shanghai Jiao Tong University
W
Wen Ling
Shanghai Jiao Tong University