CrashSage: A Large Language Model-Centered Framework for Contextual and Interpretable Traffic Crash Analysis

📅 2025-05-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
With over 1.3 million traffic crash fatalities annually worldwide, there is an urgent need for interpretable, data-driven crash analysis to inform effective road safety interventions. Method: This work introduces the first large language model (LLM)-based interpretable crash analysis framework. It overcomes the limitation of traditional statistical models in leveraging unstructured crash narratives by innovatively transforming tabular crash data into semantically rich textual representations. The framework incorporates fidelity-preserving contextual enhancement and integrates gradient-based explainability methods (e.g., Integrated Gradients) to enable fine-grained severity prediction and multi-level attribution analysis. Contribution/Results: Fine-tuned on LLaMA3-8B, the framework significantly outperforms strong baselines—including GPT-4o and LLaMA3-70B—in crash severity prediction. It supports both instance-level causal attribution and macro-level risk factor identification, thereby enhancing the precision and operationality of road safety policy and intervention design.

Technology Category

Application Category

📝 Abstract
Road crashes claim over 1.3 million lives annually worldwide and incur global economic losses exceeding $1.8 trillion. Such profound societal and financial impacts underscore the urgent need for road safety research that uncovers crash mechanisms and delivers actionable insights. Conventional statistical models and tree ensemble approaches typically rely on structured crash data, overlooking contextual nuances and struggling to capture complex relationships and underlying semantics. Moreover, these approaches tend to incur significant information loss, particularly in narrative elements related to multi-vehicle interactions, crash progression, and rare event characteristics. This study presents CrashSage, a novel Large Language Model (LLM)-centered framework designed to advance crash analysis and modeling through four key innovations. First, we introduce a tabular-to-text transformation strategy paired with relational data integration schema, enabling the conversion of raw, heterogeneous crash data into enriched, structured textual narratives that retain essential structural and relational context. Second, we apply context-aware data augmentation using a base LLM model to improve narrative coherence while preserving factual integrity. Third, we fine-tune the LLaMA3-8B model for crash severity inference, demonstrating superior performance over baseline approaches, including zero-shot, zero-shot with chain-of-thought prompting, and few-shot learning, with multiple models (GPT-4o, GPT-4o-mini, LLaMA3-70B). Finally, we employ a gradient-based explainability technique to elucidate model decisions at both the individual crash level and across broader risk factor dimensions. This interpretability mechanism enhances transparency and enables targeted road safety interventions by providing deeper insights into the most influential factors.
Problem

Research questions and friction points this paper is trying to address.

Analyzing road crashes with contextual and interpretable insights
Overcoming limitations of traditional crash data analysis methods
Enhancing crash severity prediction using LLM-based techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tabular-to-text transformation for enriched crash narratives
Context-aware data augmentation using base LLM model
Fine-tuned LLaMA3-8B model for crash severity inference
🔎 Similar Papers
2024-07-082024 IEEE International Automated Vehicle Validation Conference (IAVVC)Citations: 1
Hao Zhen
Hao Zhen
University of Georgia
machine learningintelligent transportationvehicle control
J
Jidong J. Yang
Smart Mobility and Infrastructure Lab, College of Engineering, University of Georgia, Athens, GA, USA