🤖 AI Summary
Existing methods struggle to model complex semantic correlations among heterogeneous multimodal data—such as textual incident reports, images, environmental conditions, and driver behavior—in traffic accident analysis, hindering accurate identification of critical risk factors. This paper proposes TrafficSafe, the first framework unifying accident prediction and causal attribution as a text-based reasoning task. It achieves this through multimodal data textualization, traffic-domain knowledge injection, large language model (LLM) customization via fine-tuning and prompt engineering, and a novel sentence-level feature attribution algorithm—TrafficSafe Attribution—for conditional risk analysis and data collection optimization. Evaluated on 58,903 real-world traffic accidents, TrafficSafe improves average F1-score by 42%. Crucially, it quantitatively identifies drunk driving as the leading risk factor for fatal crashes, with aggressive and alcohol-involved behaviors contributing nearly twice as much to fatality risk compared to other driver behaviors.
📝 Abstract
Predicting crash events is crucial for understanding crash distributions and their contributing factors, thereby enabling the design of proactive traffic safety policy interventions. However, existing methods struggle to interpret the complex interplay among various sources of traffic crash data, including numeric characteristics, textual reports, crash imagery, environmental conditions, and driver behavior records. As a result, they often fail to capture the rich semantic information and intricate interrelationships embedded in these diverse data sources, limiting their ability to identify critical crash risk factors. In this research, we propose TrafficSafe, a framework that adapts LLMs to reframe crash prediction and feature attribution as text-based reasoning. A multi-modal crash dataset including 58,903 real-world reports together with belonged infrastructure, environmental, driver, and vehicle information is collected and textualized into TrafficSafe Event Dataset. By customizing and fine-tuning LLMs on this dataset, the TrafficSafe LLM achieves a 42% average improvement in F1-score over baselines. To interpret these predictions and uncover contributing factors, we introduce TrafficSafe Attribution, a sentence-level feature attribution framework enabling conditional risk analysis. Findings show that alcohol-impaired driving is the leading factor in severe crashes, with aggressive and impairment-related behaviors having nearly twice the contribution for severe crashes compared to other driver behaviors. Furthermore, TrafficSafe Attribution highlights pivotal features during model training, guiding strategic crash data collection for iterative performance improvements. The proposed TrafficSafe offers a transformative leap in traffic safety research, providing a blueprint for translating advanced AI technologies into responsible, actionable, and life-saving outcomes.