🤖 AI Summary
To address the challenge of simultaneously achieving fine-grained distraction behavior modeling, cross-driver/environment generalization, and real-time inference on resource-constrained vehicular edge devices, this paper proposes a synergistic framework integrating dynamic Region-of-Interest (ROI) routing and domain-invariant adversarial learning. Methodologically: (1) we design a saliency-driven Top-K ROI pooling with dynamic computation routing, activating local inference only for hard samples to reduce average FLOPs; (2) we introduce pseudo-domain labels and adversarial training to enhance robustness against unseen drivers and degraded conditions (e.g., motion blur, low illumination). Evaluated on the State Farm dataset, our model outperforms state-of-the-art lightweight methods in accuracy while reducing inference latency by 32% and FLOPs by 41%. Crucially, it maintains stable performance under cross-domain evaluation, achieving a unified balance of efficiency, compactness, and strong generalization.
📝 Abstract
Driver distraction behavior recognition using in-vehicle cameras demands real-time inference on edge devices. However, lightweight models often fail to capture fine-grained behavioral cues, resulting in reduced performance on unseen drivers or under varying conditions. ROI-based methods also increase computational cost, making it difficult to balance efficiency and accuracy. This work addresses the need for a lightweight architecture that overcomes these constraints. We propose Computationally efficient Dynamic region of Interest Routing and domain-invariant Adversarial learning for lightweight driver behavior recognition (C-DIRA). The framework combines saliency-driven Top-K ROI pooling and fused classification for local feature extraction and integration. Dynamic ROI routing enables selective computation by applying ROI inference only to high difficulty data samples. Moreover, pseudo-domain labeling and adversarial learning are used to learn domain-invariant features robust to driver and background variation. Experiments on the State Farm Distracted Driver Detection Dataset show that C-DIRA maintains high accuracy with significantly fewer FLOPs and lower latency than prior lightweight models. It also demonstrates robustness under visual degradation such as blur and low-light, and stable performance across unseen domains. These results confirm C-DIRA's effectiveness in achieving compactness, efficiency, and generalization.