Urban Incident Prediction with Graph Neural Networks: Integrating Government Ratings and Crowdsourced Reports

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the precise prediction of urban governance incidents—such as potholes and rodent infestations—by fusing two heterogeneous data sources: sparse, unbiased government inspection scores and dense, socioeconomic-biased citizen-reported crowdsourced data. We propose a multi-view, multi-output graph neural network (GNN) framework that models spatial dependencies among neighborhoods to enable semi-supervised inference of ground-truth incident states. Crucially, we systematically quantify and correct socioeconomic biases inherent in crowdsourced reports—a first in this domain. Evaluated on three years of real-world New York City data (9.61 million reports and 1.04 million inspections), our method significantly outperforms single-source baselines. Semi-synthetic experiments demonstrate robust performance even when government inspection coverage drops to just 1% (99% sparsity), maintaining high predictive accuracy. Our work advances urban analytics by enabling reliable, bias-aware fusion of complementary yet heterogeneous civic data streams.

Technology Category

Application Category

📝 Abstract
Graph neural networks (GNNs) are widely used in urban spatiotemporal forecasting, such as predicting infrastructure problems. In this setting, government officials wish to know in which neighborhoods incidents like potholes or rodent issues occur. The true state of incidents (e.g., street conditions) for each neighborhood is observed via government inspection ratings. However, these ratings are only conducted for a sparse set of neighborhoods and incident types. We also observe the state of incidents via crowdsourced reports, which are more densely observed but may be biased due to heterogeneous reporting behavior. First, for such settings, we propose a multiview, multioutput GNN-based model that uses both unbiased rating data and biased reporting data to predict the true latent state of incidents. Second, we investigate a case study of New York City urban incidents and collect, standardize, and make publicly available a dataset of 9,615,863 crowdsourced reports and 1,041,415 government inspection ratings over 3 years and across 139 types of incidents. Finally, we show on both real and semi-synthetic data that our model can better predict the latent state compared to models that use only reporting data or models that use only rating data, especially when rating data is sparse and reports are predictive of ratings. We also quantify demographic biases in crowdsourced reporting, e.g., higher-income neighborhoods report problems at higher rates. Our analysis showcases a widely applicable approach for latent state prediction using heterogeneous, sparse, and biased data.
Problem

Research questions and friction points this paper is trying to address.

Predict urban incidents using biased crowdsourced and sparse government data
Develop GNN model integrating multi-source data for latent state prediction
Address reporting biases in crowdsourced urban incident data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiview GNN model integrates government and crowdsourced data
Public dataset with 10M+ NYC incidents released
Model outperforms single-source approaches on sparse data
🔎 Similar Papers
No similar papers found.
S
Sidhika Balachandar
Department of Computer Science, Cornell Tech
S
Shuvom Sadhuka
CSAIL, Massachusetts Institute of Technology
Bonnie Berger
Bonnie Berger
MIT
Bioinformatics
Emma Pierson
Emma Pierson
University of California, Berkeley
Machine learningStatisticsData scienceHealthcareInequality
N
Nikhil Garg
Department of Operations Research, Jacobs Technion-Cornell Institute Cornell Tech