Urban Incident Prediction with Graph Neural Networks: Integrating Government Ratings and Crowdsourced Reports

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This study addresses the precise prediction of urban governance incidents—such as potholes and rodent infestations—by fusing two heterogeneous data sources: sparse, unbiased government inspection scores and dense, socioeconomic-biased citizen-reported crowdsourced data. We propose a multi-view, multi-output graph neural network (GNN) framework that models spatial dependencies among neighborhoods to enable semi-supervised inference of ground-truth incident states. Crucially, we systematically quantify and correct socioeconomic biases inherent in crowdsourced reports—a first in this domain. Evaluated on three years of real-world New York City data (9.61 million reports and 1.04 million inspections), our method significantly outperforms single-source baselines. Semi-synthetic experiments demonstrate robust performance even when government inspection coverage drops to just 1% (99% sparsity), maintaining high predictive accuracy. Our work advances urban analytics by enabling reliable, bias-aware fusion of complementary yet heterogeneous civic data streams.

Technology Category

Application Category

📝 Abstract

Graph neural networks (GNNs) are widely used in urban spatiotemporal forecasting, such as predicting infrastructure problems. In this setting, government officials wish to know in which neighborhoods incidents like potholes or rodent issues occur. The true state of incidents (e.g., street conditions) for each neighborhood is observed via government inspection ratings. However, these ratings are only conducted for a sparse set of neighborhoods and incident types. We also observe the state of incidents via crowdsourced reports, which are more densely observed but may be biased due to heterogeneous reporting behavior. First, for such settings, we propose a multiview, multioutput GNN-based model that uses both unbiased rating data and biased reporting data to predict the true latent state of incidents. Second, we investigate a case study of New York City urban incidents and collect, standardize, and make publicly available a dataset of 9,615,863 crowdsourced reports and 1,041,415 government inspection ratings over 3 years and across 139 types of incidents. Finally, we show on both real and semi-synthetic data that our model can better predict the latent state compared to models that use only reporting data or models that use only rating data, especially when rating data is sparse and reports are predictive of ratings. We also quantify demographic biases in crowdsourced reporting, e.g., higher-income neighborhoods report problems at higher rates. Our analysis showcases a widely applicable approach for latent state prediction using heterogeneous, sparse, and biased data.

Problem

Research questions and friction points this paper is trying to address.

Predict urban incidents using biased crowdsourced and sparse government data

Develop GNN model integrating multi-source data for latent state prediction

Address reporting biases in crowdsourced urban incident data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiview GNN model integrates government and crowdsourced data

Public dataset with 10M+ NYC incidents released

Model outperforms single-source approaches on sparse data

🔎 Similar Papers

Urban Safety Perception Assessments via Integrating Multimodal Large Language Models with Street View Images