🤖 AI Summary
Short-term public transit passenger flow forecasting faces persistent challenges—including weak multimodal data fusion, underutilization of open heterogeneous data, poor model interpretability, and high deployment costs—despite rapid methodological evolution. Method: This study conducts a systematic bibliometric analysis of 814 publications (1984–2024), integrating improved citation network modeling with BERTopic-based thematic analysis to quantitatively trace disciplinary evolution. Results: Research activity surged post-2008, shifting from classical statistical models (e.g., ARIMA) toward deep learning paradigms—particularly graph neural networks and spatiotemporal Transformers. Crucially, we propose a dynamic weighted citation network to uncover structural biases in knowledge diffusion and empirically demonstrate how foundation models are reshaping methodological priorities. These findings provide a theoretically grounded, technically actionable framework for developing trustworthy, deployable intelligent passenger flow forecasting systems.
📝 Abstract
This paper presents a bibliometric analysis of the field of short-term passenger flow forecasting within local public transit, covering 814 publications that span from 1984 to 2024. In addition to common bibliometric analysis tools, a variant of a citation network was developed, and topic modelling was conducted. The analysis reveals that research activity exhibited sporadic patterns prior to 2008, followed by a marked acceleration, characterised by a shift from conventional statistical and machine learning methodologies (e.g., ARIMA, SVM, and basic neural networks) to specialised deep learning architectures. Based on this insight, a connection to more general fields such as machine learning and time series modelling was established. In addition to modelling, spatial, linguistic, and modal biases were identified and findings from existing secondary literature were validated and quantified. This revealed existing gaps, such as constrained data fusion, open (multivariate) data, and underappreciated challenges related to model interpretability, cost-efficiency, and a balance between algorithmic performance and practical deployment considerations. In connection with the superordinate fields, the growth in relevance of foundation models is also noteworthy.