Detection of Intoxicated Individuals from Facial Video Sequences via a Recurrent Fusion Model

📅 2025-12-04

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This study addresses public safety risks posed by alcohol intoxication by proposing a non-intrusive drunkenness detection method based on facial video sequences. To overcome limitations in dynamic facial modeling and multi-scale spatiotemporal feature fusion inherent in existing approaches, we introduce the first large-scale drunken behavior video dataset (3,542 clips) and design a dual-stream dynamic feature extraction architecture integrating Graph Attention Networks (GAT) and 3D ResNet—jointly modeling facial landmark topological relationships and local spatiotemporal texture variations. Evaluated on our benchmark dataset, the proposed method achieves 95.82% accuracy, 0.977 precision, and 0.97 recall, significantly outperforming state-of-the-art methods—including single-stream baselines such as 3D-CNN and VGGFace+LSTM—and demonstrating strong potential for real-time deployment.

Technology Category

Application Category

📝 Abstract

Alcohol consumption is a significant public health concern and a major cause of accidents and fatalities worldwide. This study introduces a novel video-based facial sequence analysis approach dedicated to the detection of alcohol intoxication. The method integrates facial landmark analysis via a Graph Attention Network (GAT) with spatiotemporal visual features extracted using a 3D ResNet. These features are dynamically fused with adaptive prioritization to enhance classification performance. Additionally, we introduce a curated dataset comprising 3,542 video segments derived from 202 individuals to support training and evaluation. Our model is compared against two baselines: a custom 3D-CNN and a VGGFace+LSTM architecture. Experimental results show that our approach achieves 95.82% accuracy, 0.977 precision, and 0.97 recall, outperforming prior methods. The findings demonstrate the model's potential for practical deployment in public safety systems for non-invasive, reliable alcohol intoxication detection.

Problem

Research questions and friction points this paper is trying to address.

Detects alcohol intoxication from facial video sequences

Integrates facial landmarks and spatiotemporal features via fusion model

Aims for non-invasive public safety deployment with high accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph Attention Network analyzes facial landmarks

3D ResNet extracts spatiotemporal visual features

Adaptive prioritization fuses features dynamically

🔎 Similar Papers

No similar papers found.