Importance-aware Topic Modeling for Discovering Public Transit Risk from Noisy Social Media

📅 2025-12-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Public transportation risk signals on social media are sparse, noisy, and easily obscured by everyday discourse. Method: We propose an influence-weighted Poisson deconvolution factorization framework. It constructs a user-influence-weighted keyword co-occurrence graph, incorporates decorrelation regularization to enhance topic discriminability, and integrates consistency-driven automatic topic number selection with non-negative normalization optimization for robust, interpretable risk-topic discovery. Contribution/Results: This work is the first to incorporate user influence modeling into sparse-text topic decomposition, significantly improving both accuracy and interpretability of risk identification. Experiments on large-scale social data streams demonstrate superior topic coherence and diversity compared to state-of-the-art baselines. The code and dataset are publicly available.

Technology Category

Application Category

📝 Abstract
Urban transit agencies increasingly turn to social media to monitor emerging service risks such as crowding, delays, and safety incidents, yet the signals of concern are sparse, short, and easily drowned by routine chatter. We address this challenge by jointly modeling linguistic interactions and user influence. First, we construct an influence-weighted keyword co-occurrence graph from cleaned posts so that socially impactful posts contributes proportionally to the underlying evidence. The core of our framework is a Poisson Deconvolution Factorization (PDF) that decomposes this graph into a low-rank topical structure and topic-localized residual interactions, producing an interpretable topic--keyword basis together with topic importance scores. A decorrelation regularizer emph{promotes} distinct topics, and a lightweight optimization procedure ensures stable convergence under nonnegativity and normalization constraints. Finally, the number of topics is selected through a coherence-driven sweep that evaluates the quality and distinctness of the learned topics. On large-scale social streams, the proposed model achieves state-of-the-art topic coherence and strong diversity compared with leading baselines. The code and dataset are publicly available at https://github.com/pangjunbiao/Topic-Modeling_ITS.git
Problem

Research questions and friction points this paper is trying to address.

Detects transit risks from sparse social media posts
Models user influence and linguistic interactions jointly
Extracts interpretable topics with importance scores
Innovation

Methods, ideas, or system contributions that make the work stand out.

Influence-weighted keyword co-occurrence graph construction
Poisson Deconvolution Factorization for topic decomposition
Coherence-driven topic selection with decorrelation regularization
🔎 Similar Papers
No similar papers found.
F
Fatima Ashraf
Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
M
Muhammad Ayub Sabir
Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
J
Jiaxin Deng
Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
Junbiao Pang
Junbiao Pang
Beijing University of Technology
computer visionmultimediamachine learning
H
Haitao Yu
Beijing Intelligent Transportation Development Center, Beijing 100161, China