🤖 AI Summary
Traffic crash causation is highly complex, necessitating modeling approaches that simultaneously achieve high predictive accuracy and causal interpretability. To address this, we propose the Feature Group Tabular Transformer (FGTT), the first tabular transformer architecture that constructs semantic feature tokens via domain-informed grouping. FGTT unifies heterogeneous data sources—including meteorological conditions, crash reports, high-resolution traffic flow, road geometry, and infrastructure attributes—to jointly model crash type prediction and causal mechanism analysis. Evaluated on real-world crash datasets, FGTT significantly outperforms state-of-the-art tree-based models (e.g., Random Forest, XGBoost, CatBoost) in predictive performance. Leveraging SHAP for post-hoc interpretability, FGTT identifies actionable causal patterns—such as “rainy weather + curved roadway + unsignalized intersection”—with high fidelity. This enables evidence-based, targeted interventions for road safety improvement.
📝 Abstract
Reliable and interpretable traffic crash modeling is essential for understanding causality and improving road safety. This study introduces a novel approach to predicting collision types by utilizing a comprehensive dataset fused from multiple sources, including weather data, crash reports, high-resolution traffic information, pavement geometry, and facility characteristics. Central to our approach is the development of a Feature Group Tabular Transformer (FGTT) model, which organizes disparate data into meaningful feature groups, represented as tokens. These group-based tokens serve as rich semantic components, enabling effective identification of collision patterns and interpretation of causal mechanisms. The FGTT model is benchmarked against widely used tree ensemble models, including Random Forest, XGBoost, and CatBoost, demonstrating superior predictive performance. Furthermore, model interpretation reveals key influential factors, providing fresh insights into the underlying causality of distinct crash types.