🤖 AI Summary
Urban multimodal data—including news reports, CCTV imagery, air quality measurements, weather records, and traffic flows—are highly fragmented, rendering manual modeling of cross-source event correlations inefficient and poorly scalable, thereby hindering causal analysis and evolutionary forecasting of urban incidents. To address this, we propose a large language model (LLM)-driven approach for constructing dynamic semantic knowledge graphs. Our method uniquely leverages LLMs to automatically uncover latent semantic relationships between heterogeneous urban data and突发事件 without handcrafted rules. By integrating spatiotemporally aligned multimodal data, it enables fully automated, incremental knowledge graph construction. Evaluated across five distinct data modalities, the approach establishes robust event associations, significantly improving event detection accuracy and causal reasoning capability. It delivers an interpretable, evolvable semantic infrastructure for real-time situational awareness, root-cause attribution, and scale prediction of urban emergencies.
📝 Abstract
Modern urban spaces are equipped with an increasingly diverse set of sensors, all producing an abundance of multimodal data. Such multimodal data can be used to identify and reason about important incidents occurring in urban landscapes, such as major emergencies, cultural and social events, as well as natural disasters. However, such data may be fragmented over several sources and difficult to integrate due to the reliance on human-driven reasoning for identifying relationships between the multimodal data corresponding to an incident, as well as understanding the different components which define an incident. Such relationships and components are critical to identifying the causes of such incidents, as well as producing forecasting the scale and intensity of future incidents as they begin to develop. In this work, we create SIGMUS, a system for Semantic Integration for Knowledge Graphs in Multimodal Urban Spaces. SIGMUS uses Large Language Models (LLMs) to produce the necessary world knowledge for identifying relationships between incidents occurring in urban spaces and data from different modalities, allowing us to organize evidence and observations relevant to an incident without relying and human-encoded rules for relating multimodal sensory data with incidents. This organized knowledge is represented as a knowledge graph, organizing incidents, observations, and much more. We find that our system is able to produce reasonable connections between 5 different data sources (new article text, CCTV images, air quality, weather, and traffic measurements) and relevant incidents occurring at the same time and location.