Comparing hundreds of machine learning classifiers and discrete choice models in predicting travel behavior: an empirical benchmark

📅 2021-02-01
🏛️ arXiv.org
📈 Citations: 26
Influential: 2
📄 PDF
🤖 AI Summary
This study addresses the lack of systematic benchmarks for evaluating model performance in travel demand forecasting by establishing the first tournament-based large-scale empirical benchmark, incorporating 6,970 experiments across 105 machine learning and discrete choice models. Methodologically, it integrates ANOVA, mixed-effects modeling, meta-analysis, and diverse modeling techniques—including random forests, XGBoost, multilayer perceptrons, LSTMs, and multinomial/nested/mixed logit models. Key contributions are: (1) contextual factors—such as sample size and choice-set dimensionality—explain over 47% of performance variance, substantially exceeding the explanatory power of model type; (2) irreducible residual stochasticity is identified, exposing inherent uncertainty in model comparisons; and (3) a novel paradigm of “cross-context transferability” is proposed, shifting travel demand modeling from isolated model optimization toward context-aware, robust design.
📝 Abstract
Numerous studies have compared machine learning (ML) and discrete choice models (DCMs) in predicting travel demand. However, these studies often lack generalizability as they compare models deterministically without considering contextual variations. To address this limitation, our study develops an empirical benchmark by designing a tournament model, thus efficiently summarizing a large number of experiments, quantifying the randomness in model comparisons, and using formal statistical tests to differentiate between the model and contextual effects. This benchmark study compares two large-scale data sources: a database compiled from literature review summarizing 136 experiments from 35 studies, and our own experiment data, encompassing a total of 6,970 experiments from 105 models and 12 model families. This benchmark study yields two key findings. Firstly, many ML models, particularly the ensemble methods and deep learning, statistically outperform the DCM family (i.e., multinomial, nested, and mixed logit models). However, this study also highlights the crucial role of the contextual factors (i.e., data sources, inputs and choice categories), which can explain models' predictive performance more effectively than the differences in model types alone. Model performance varies significantly with data sources, improving with larger sample sizes and lower dimensional alternative sets. After controlling all the model and contextual factors, significant randomness still remains, implying inherent uncertainty in such model comparisons. Overall, we suggest that future researchers shift more focus from context-specific model comparisons towards examining model transferability across contexts and characterizing the inherent uncertainty in ML, thus creating more robust and generalizable next-generation travel demand models.
Problem

Research questions and friction points this paper is trying to address.

Travel Demand Prediction
Environmental Factors
Model Accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine Learning Models
Environmental Factors
Uncertainty in Model Comparison
🔎 Similar Papers
No similar papers found.
Shenhao Wang
Shenhao Wang
University of Florida; Massachusetts Institute of Technology
Urban AIComputational Social ScienceTravel BehaviorUrban SystemsResilience
Baichuan Mo
Baichuan Mo
PhD @ MIT, Research Scientist @ TikTok, Lyft
TransportationOptimizationMachine LearningDemand Modeling
S
S. Hess
Choice Modeling Centre & Institute for Transport Studies, University of Leeds
J
Jinhuan Zhao
Department of Urban Studies and Planning, Massachusetts Institute of Technology