World-POI: Global Point-of-Interest Data Enriched from Foursquare and OpenStreetMap as Tabular and Graph Data

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address metadata incompleteness and synthetic POIs in Foursquare data, as well as the lack of business authenticity verification in OpenStreetMap (OSM) data, this paper proposes a dual-source POI alignment and enhancement framework. It jointly leverages name similarity and geospatial distance for high-confidence record linkage; introduces a tunable-threshold filtering mechanism to ensure data quality while enabling scalable deployment; and constructs a richly attributed commercial entity knowledge graph via metadata fusion. We release, for the first time, a globally integrated POI dataset available in both tabular and graph-structured formats—comprising a reproducible 631 GB subset (derived from an original ~1 TB corpus). This dataset significantly improves POI completeness, accuracy, and usability, thereby supporting downstream applications including spatial analytics and location intelligence.

Technology Category

Application Category

📝 Abstract
Recently, Foursquare released a global dataset with more than 100 million points of interest (POIs), each representing a real-world business on its platform. However, many entries lack complete metadata such as addresses or categories, and some correspond to non-existent or fictional locations. In contrast, OpenStreetMap (OSM) offers a rich, user-contributed POI dataset with detailed and frequently updated metadata, though it does not formally verify whether a POI represents an actual business. In this data paper, we present a methodology that integrates the strengths of both datasets: Foursquare as a comprehensive baseline of commercial POIs and OSM as a source of enriched metadata. The combined dataset totals approximately 1 TB. While this full version is not publicly released, we provide filtered releases with adjustable thresholds that reduce storage needs and make the data practical to download and use across domains. We also provide step-by-step instructions to reproduce the full 631 GB build. Record linkage is achieved by computing name similarity scores and spatial distances between Foursquare and OSM POIs. These measures identify and retain high-confidence matches that correspond to real businesses in Foursquare, have representations in OSM, and show strong name similarity. Finally, we use this filtered dataset to construct a graph-based representation of POIs enriched with attributes from both sources, enabling advanced spatial analyses and a range of downstream applications.
Problem

Research questions and friction points this paper is trying to address.

Enhancing incomplete POI metadata from Foursquare using OpenStreetMap enrichment
Identifying real businesses by matching Foursquare and OSM spatial data
Constructing attributed graph representations for advanced spatial analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Foursquare and OpenStreetMap POI datasets
Links records using name similarity and spatial distance
Constructs graph-based representation with enriched attributes
🔎 Similar Papers
No similar papers found.