Harnessing Rich Multi-Modal Data for Spatial-Temporal Homophily-Embedded Graph Learning Across Domains and Localities

πŸ“… 2025-12-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Urban data exhibit strong heterogeneity and multi-source dispersion, hindering intelligent cross-regional and cross-task decision-making. To address this, we propose a heterogeneous multi-source urban data fusion framework. Our method introduces a spatiotemporal graph neural network that explicitly models spatial variability and homogeneity as a graph learning priorβ€”novel in urban AI. We further design a multimodal alignment embedding module and a federated heterogeneous data fusion mechanism, enabling unified representation of over 50 cross-domain, multimodal data sources (e.g., ride-hailing, traffic accidents, crime). Crucially, the framework supports zero-shot transfer: it adapts to unseen cities or tasks without retraining, requiring fine-tuning of fewer than 5% of parameters. Evaluated across five real-world city scenarios, it achieves an average 32% reduction in MAE over state-of-the-art baselines, demonstrating superior generalizability and practical utility.

Technology Category

Application Category

πŸ“ Abstract
Modern cities are increasingly reliant on data-driven insights to support decision making in areas such as transportation, public safety and environmental impact. However, city-level data often exists in heterogeneous formats, collected independently by local agencies with diverse objectives and standards. Despite their numerous, wide-ranging, and uniformly consumable nature, national-level datasets exhibit significant heterogeneity and multi-modality. This research proposes a heterogeneous data pipeline that performs cross-domain data fusion over time-varying, spatial-varying and spatial-varying time-series datasets. We aim to address complex urban problems across multiple domains and localities by harnessing the rich information over 50 data sources. Specifically, our data-learning module integrates homophily from spatial-varying dataset into graph-learning, embedding information of various localities into models. We demonstrate the generalizability and flexibility of the framework through five real-world observations using a variety of publicly accessible datasets (e.g., ride-share, traffic crash, and crime reports) collected from multiple cities. The results show that our proposed framework demonstrates strong predictive performance while requiring minimal reconfiguration when transferred to new localities or domains. This research advances the goal of building data-informed urban systems in a scalable way, addressing one of the most pressing challenges in smart city analytics.
Problem

Research questions and friction points this paper is trying to address.

Integrates heterogeneous multi-modal urban data across domains
Fuses spatial-temporal datasets for cross-domain urban problem solving
Embeds locality-specific homophily into graph learning for scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-domain data fusion for heterogeneous multi-modal datasets
Homophily-embedded graph learning integrating spatial-temporal information
Scalable framework transferable across localities with minimal reconfiguration
πŸ”Ž Similar Papers
No similar papers found.