🤖 AI Summary
Existing publicly available human mobility datasets largely lack contextual and socio-semantic information, hindering multimodal modeling and semantic analysis. To address this, we introduce two large-scale, semantically enriched trajectory datasets—Paris-Mobility and NYC-Mobility—that jointly integrate real-world GPS trajectories, stop/move segments, POIs, transportation modes, real-time weather, and social-text annotations generated by large language models. Leveraging Semantic Web technologies, we construct RDF-based knowledge graphs grounded in these multimodal data. All resources strictly adhere to the FAIR principles and are released in both tabular and RDF formats. Furthermore, we open-source the end-to-end data curation pipeline. This framework enables diverse downstream tasks—including human behavior modeling, mobility prediction, cross-modal reasoning, and knowledge graph research—while significantly enhancing semantic interpretability and AI reusability of mobility data. The datasets serve as a novel infrastructure for smart city analytics and embodied intelligence research.
📝 Abstract
In this resource paper, we present two publicly available datasets of semantically enriched human trajectories, together with the pipeline to build them. The trajectories are publicly available GPS traces retrieved from OpenStreetMap. Each dataset includes contextual layers such as stops, moves, points of interest (POIs), inferred transportation modes, and weather data. A novel semantic feature is the inclusion of synthetic, realistic social media posts generated by Large Language Models (LLMs), enabling multimodal and semantic mobility analysis. The datasets are available in both tabular and Resource Description Framework (RDF) formats, supporting semantic reasoning and FAIR data practices. They cover two structurally distinct, large cities: Paris and New York. Our open source reproducible pipeline allows for dataset customization, while the datasets support research tasks such as behavior modeling, mobility prediction, knowledge graph construction, and LLM-based applications. To our knowledge, our resource is the first to combine real-world movement, structured semantic enrichment, LLM-generated text, and semantic web compatibility in a reusable framework.