Human Mobility Datasets Enriched With Contextual and Social Dimensions

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing publicly available human mobility datasets largely lack contextual and socio-semantic information, hindering multimodal modeling and semantic analysis. To address this, we introduce two large-scale, semantically enriched trajectory datasets—Paris-Mobility and NYC-Mobility—that jointly integrate real-world GPS trajectories, stop/move segments, POIs, transportation modes, real-time weather, and social-text annotations generated by large language models. Leveraging Semantic Web technologies, we construct RDF-based knowledge graphs grounded in these multimodal data. All resources strictly adhere to the FAIR principles and are released in both tabular and RDF formats. Furthermore, we open-source the end-to-end data curation pipeline. This framework enables diverse downstream tasks—including human behavior modeling, mobility prediction, cross-modal reasoning, and knowledge graph research—while significantly enhancing semantic interpretability and AI reusability of mobility data. The datasets serve as a novel infrastructure for smart city analytics and embodied intelligence research.

Technology Category

Application Category

📝 Abstract

In this resource paper, we present two publicly available datasets of semantically enriched human trajectories, together with the pipeline to build them. The trajectories are publicly available GPS traces retrieved from OpenStreetMap. Each dataset includes contextual layers such as stops, moves, points of interest (POIs), inferred transportation modes, and weather data. A novel semantic feature is the inclusion of synthetic, realistic social media posts generated by Large Language Models (LLMs), enabling multimodal and semantic mobility analysis. The datasets are available in both tabular and Resource Description Framework (RDF) formats, supporting semantic reasoning and FAIR data practices. They cover two structurally distinct, large cities: Paris and New York. Our open source reproducible pipeline allows for dataset customization, while the datasets support research tasks such as behavior modeling, mobility prediction, knowledge graph construction, and LLM-based applications. To our knowledge, our resource is the first to combine real-world movement, structured semantic enrichment, LLM-generated text, and semantic web compatibility in a reusable framework.

Problem

Research questions and friction points this paper is trying to address.

Enriching human mobility data with contextual and semantic dimensions

Generating synthetic social media posts using Large Language Models

Creating reusable datasets for multimodal mobility analysis applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enriching GPS trajectories with contextual and semantic layers

Generating synthetic social media posts using Large Language Models

Providing datasets in tabular and RDF formats for reasoning

🔎 Similar Papers

Human Mobility Modeling with Limited Information via Large Language Models