Beyond Accuracy: A Unified Random Matrix Theory Diagnostic Framework for Crash Classification Models

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This study addresses the challenge of detecting implicit overfitting in traffic collision classification models, which conventional evaluation metrics often fail to capture. It introduces, for the first time, a unified spectral diagnostic framework by integrating Random Matrix Theory (RMT) and Heavy-Tailed Self-Regularization (HTSR) to assess model generalization through spectral analysis of weight and Hessian matrices. The work proposes the power-law exponent α as a universal indicator of model structural quality and leverages it to design an early-stopping criterion and model selection protocol. Experiments on two large-scale real-world datasets demonstrate that well-regularized models consistently exhibit α ∈ [2, 4] (mean 2.87 ± 0.34), whereas α < 2 or spectral collapse signals overfitting. The metric α shows strong agreement with expert judgment (Spearman ρ = 0.89, p < 0.001) and significantly outperforms traditional cross-validation baselines such as F1 score.

Technology Category

Application Category

📝 Abstract

Crash classification models in transportation safety are typically evaluated using accuracy, F1, or AUC, metrics that cannot reveal whether a model is silently overfitting. We introduce a spectral diagnostic framework grounded in Random Matrix Theory (RMT) and Heavy-Tailed Self-Regularization (HTSR) that spans the ML taxonomy: weight matrices for BERT/ALBERT/Qwen2.5, out-of-fold increment matrices for XGBoost/Random Forest, empirical Hessians for Logistic Regression, induced affinity matrices for Decision Trees, and Graph Laplacians for KNN. Evaluating nine model families on two Iowa DOT crash classification tasks (173,512 and 371,062 records respectively), we find that the power-law exponent $α$ provides a structural quality signal: well-regularized models consistently yield $α$ within $[2, 4]$ (mean $2.87 \pm 0.34$), while overfit variants show $α< 2$ or spectral collapse. We observe a strong rank correlation between $α$ and expert agreement (Spearman $ρ= 0.89$, $p < 0.001$), suggesting spectral quality captures model behaviors aligned with expert reasoning. We propose an $α$-based early stopping criterion and a spectral model selection protocol, and validate both against cross-validated F1 baselines. Sparse Lanczos approximations make the framework scalable to large datasets.

Problem

Research questions and friction points this paper is trying to address.

crash classification

overfitting

model evaluation

Random Matrix Theory

spectral diagnostics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Random Matrix Theory

Heavy-Tailed Self-Regularization

spectral diagnostics