π€ AI Summary
Traffic accident prediction remains challenging due to the difficulty of modeling driver cognition and dynamic road environments. To address this, we propose a human-centered dynamic risk perception model that integrates driving videos, textual context, and driver attention maps for fine-grained, early accidentι’θ¦. Methodologically, we design an adaptive risk thresholding mechanism that jointly considers scene complexity and gaze entropy; construct a hierarchical multimodal fusion architecture incorporating geospatial contextual visual-language modules; employ a Bi-GRU to capture spatiotemporal dependencies; and introduce 3D spatial relation encoding with context-aware cross-modal alignment. Evaluated on benchmarks including DADA-2000, our approach achieves significant improvements: +1.8 seconds in average early warning lead time, a 23.6% reduction in false positive rate, and enhanced prediction accuracy and model interpretability.
π Abstract
Accurate accident anticipation remains challenging when driver cognition and dynamic road conditions are underrepresented in predictive models. In this paper, we propose CAMERA (Context-Aware Multi-modal Enhanced Risk Anticipation), a multi-modal framework integrating dashcam video, textual annotations, and driver attention maps for robust accident anticipation. Unlike existing methods that rely on static or environment-centric thresholds, CAMERA employs an adaptive mechanism guided by scene complexity and gaze entropy, reducing false alarms while maintaining high recall in dynamic, multi-agent traffic scenarios. A hierarchical fusion pipeline with Bi-GRU (Bidirectional GRU) captures spatio-temporal dependencies, while a Geo-Context Vision-Language module translates 3D spatial relationships into interpretable, human-centric alerts. Evaluations on the DADA-2000 and benchmarks show that CAMERA achieves state-of-the-art performance, improving accuracy and lead time. These results demonstrate the effectiveness of modeling driver attention, contextual description, and adaptive risk thresholds to enable more reliable accident anticipation.