🤖 AI Summary
This study addresses public safety risks posed by alcohol intoxication by proposing a non-intrusive drunkenness detection method based on facial video sequences. To overcome limitations in dynamic facial modeling and multi-scale spatiotemporal feature fusion inherent in existing approaches, we introduce the first large-scale drunken behavior video dataset (3,542 clips) and design a dual-stream dynamic feature extraction architecture integrating Graph Attention Networks (GAT) and 3D ResNet—jointly modeling facial landmark topological relationships and local spatiotemporal texture variations. Evaluated on our benchmark dataset, the proposed method achieves 95.82% accuracy, 0.977 precision, and 0.97 recall, significantly outperforming state-of-the-art methods—including single-stream baselines such as 3D-CNN and VGGFace+LSTM—and demonstrating strong potential for real-time deployment.
📝 Abstract
Alcohol consumption is a significant public health concern and a major cause of accidents and fatalities worldwide. This study introduces a novel video-based facial sequence analysis approach dedicated to the detection of alcohol intoxication. The method integrates facial landmark analysis via a Graph Attention Network (GAT) with spatiotemporal visual features extracted using a 3D ResNet. These features are dynamically fused with adaptive prioritization to enhance classification performance. Additionally, we introduce a curated dataset comprising 3,542 video segments derived from 202 individuals to support training and evaluation. Our model is compared against two baselines: a custom 3D-CNN and a VGGFace+LSTM architecture. Experimental results show that our approach achieves 95.82% accuracy, 0.977 precision, and 0.97 recall, outperforming prior methods. The findings demonstrate the model's potential for practical deployment in public safety systems for non-invasive, reliable alcohol intoxication detection.