Learning Multimodal Embeddings for Traffic Accident Prediction and Causal Estimation

📅 2025-12-02

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Prior traffic accident prediction models largely neglect physical road surface and environmental attributes, limiting causal interpretability and predictive accuracy. Method: We propose a multimodal causal forecasting framework integrating road network topology, high-resolution satellite imagery, weather conditions, road type, and traffic flow. We design the first graph neural network architecture jointly modeling road network graphs and satellite-derived visual features, augmented with multimodal embedding fusion and causal matching estimation to quantify independent causal effects of precipitation, vehicle speed, and seasonality. Contribution/Results: Our model achieves a mean AUROC of 90.1%, outperforming a graph-only baseline by 3.7%. Attribution analysis reveals that precipitation, highway segments, and seasonal factors independently increase accident risk by 24%, 22%, and 29%, respectively—providing actionable, interpretable causal insights for targeted traffic safety interventions.

Technology Category

Application Category

📝 Abstract

We consider analyzing traffic accident patterns using both road network data and satellite images aligned to road graph nodes. Previous work for predicting accident occurrences relies primarily on road network structural features while overlooking physical and environmental information from the road surface and its surroundings. In this work, we construct a large multimodal dataset across six U.S. states, containing nine million traffic accident records from official sources, and one million high-resolution satellite images for each node of the road network. Additionally, every node is annotated with features such as the region's weather statistics and road type (e.g., residential vs. motorway), and each edge is annotated with traffic volume information (i.e., Average Annual Daily Traffic). Utilizing this dataset, we conduct a comprehensive evaluation of multimodal learning methods that integrate both visual and network embeddings. Our findings show that integrating both data modalities improves prediction accuracy, achieving an average AUROC of $90.1%$, which is a $3.7%$ gain over graph neural network models that only utilize graph structures. With the improved embeddings, we conduct a causal analysis based on a matching estimator to estimate the key contributing factors influencing traffic accidents. We find that accident rates rise by $24%$ under higher precipitation, by $22%$ on higher-speed roads such as motorways, and by $29%$ due to seasonal patterns, after adjusting for other confounding factors. Ablation studies confirm that satellite imagery features are essential for achieving accurate prediction.

Problem

Research questions and friction points this paper is trying to address.

Integrates road network and satellite imagery for accident prediction

Estimates causal factors like weather and road type on accidents

Improves prediction accuracy over graph-only models using multimodal data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates satellite images with road network data for multimodal learning

Uses graph neural networks enhanced by visual and environmental features

Applies causal analysis with matching estimators to identify accident factors

🔎 Similar Papers

Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding