Evaluating the effects of Data Sparsity on the Link-level Bicycling Volume Estimation: A Graph Convolutional Neural Network Approach

📅 2024-10-11
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of low link-level bicycle traffic volume prediction accuracy caused by high data sparsity and complex mobility patterns. We propose the first graph convolutional neural network (GCN)-based model for link-level cycling demand forecasting, validated systematically using Strava Metro data across 15,933 road segments in Melbourne. A key contribution is the quantitative characterization of the nonlinear impact of data sparsity (0%–99%) on model performance, identifying 80% sparsity as a robustness critical threshold: below this level, GCN significantly outperforms baseline models—including linear regression, SVM, and random forest—while performance degrades sharply beyond it. The findings provide a practical, data-driven methodology to support sustainable urban transport planning under highly sparse observational conditions.

Technology Category

Application Category

📝 Abstract
Accurate bicycling volume estimation is crucial for making informed decisions and planning about future investments in bicycling infrastructure. However, traditional link-level volume estimation models are effective for motorized traffic but face significant challenges when applied to the bicycling context because of sparse data and the intricate nature of bicycling mobility patterns. To the best of our knowledge, we present the first study to utilize a Graph Convolutional Network (GCN) architecture to model link-level bicycling volumes and systematically investigate the impact of varying levels of data sparsity (0%--99%) on model performance, simulating real-world scenarios. We have leveraged Strava Metro data as the primary source of bicycling counts across 15,933 road segments/links in the City of Melbourne, Australia. To evaluate the effectiveness of the GCN model, we benchmark it against traditional machine learning models, such as linear regression, support vector machines, and random forest. Our results show that the GCN model outperforms these traditional models in predicting Annual Average Daily Bicycle (AADB) counts, demonstrating its ability to capture the spatial dependencies inherent in bicycle traffic networks. While GCN remains robust up to 80% sparsity, its performance declines sharply beyond this threshold, highlighting the challenges of extreme data sparsity. These findings underscore the potential of GCNs in enhancing bicycling volume estimation, while also emphasizing the need for further research on methods to improve model resilience under high-sparsity conditions. Our findings offer valuable insights for city planners aiming to improve bicycling infrastructure and promote sustainable transportation.
Problem

Research questions and friction points this paper is trying to address.

Evaluating data sparsity impact on bicycling volume estimation
Using GCN to model link-level bicycling volumes effectively
Assessing GCN performance under varying sparsity conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Graph Convolutional Network for bicycling volume estimation
Evaluates data sparsity impact from 0% to 99%
Benchmarks GCN against traditional machine learning models
🔎 Similar Papers
No similar papers found.
M
Mohit Gupta
School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia
D
D. Bhowmick
School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia
M
M. Saberi
School of Civil and Environmental Engineering, Research Centre for Integrated Transport Innovation (rCITI)
Shirui Pan
Shirui Pan
Professor, ARC Future Fellow, FQA, Director of TrustAGI Lab, Griffith University
Data MiningMachine LearningGraph Neural NetworksTrustworthy AITime Series
Ben Beck
Ben Beck
Sustainable Mobility and Safety Research, Monash University
Active transportBike ridingWalkingRoad safety