🤖 AI Summary
Heterogeneity across POI classification systems impedes interoperability between open and commercial geospatial data sources.
Method: We propose the first systematic mapping framework bridging OpenStreetMap (OSM) community tags and Foursquare’s commercial taxonomy, employing a three-stage pipeline: (1) constructing a human-validated open benchmark dataset; (2) performing coarse-grained semantic alignment via pretrained text embeddings; and (3) applying large language models for fine-grained hierarchical refinement and ambiguity resolution, integrated into an automated pipeline supporting OSM’s dynamic updates.
Contributions/Results: (1) We release the first publicly available, reproducible cross-source POI classification mapping benchmark and evaluation toolkit; (2) our method significantly enhances interoperability between heterogeneous POI taxonomies, with empirical validation in urban analytics and mobility modeling demonstrating high mapping accuracy and generalizability; (3) we establish a scalable, adaptive paradigm for cross-platform geospatial information fusion.
📝 Abstract
The heterogeneity of Point of Interest (POI) taxonomies is a persistent challenge for the integration of urban datasets and the development of location-based services. OpenStreetMap (OSM) adopts a flexible, community-driven tagging system, while Foursquare (FS) relies on a curated hierarchical structure. Here we present an openly available benchmark and mapping framework that aligns OSM tags with the FS taxonomy. This resource integrates the richness of community-driven OSM data with the hierarchical structure of FS, enabling reproducible and interoperable urban analytics. The dataset is complemented by an evaluation of embedding and LLM-based alignment strategies and a pipeline that supports scalable updates as OSM evolves. Together, these elements provide both a robust reference resource and a practical tool for the community. Our approach is structured around three components: the construction of a manually curated benchmark as a gold standard, the evaluation of pretrained text embedding models for semantic alignment between OSM tags and FS categories, and an LLM-based refinement stage that enhances robustness and adaptability. The proposed methodology provides a scalable and reproducible solution for taxonomy unification, with direct applications to urban analytics, mobility studies, and smart city services.