🤖 AI Summary
To address overfitting and poor generalization in cross-domain facial action unit (AU) detection, this paper proposes a dual-adaptive Dropout mechanism that jointly suppresses domain-specific noise at both the convolutional feature map and Transformer spatial token levels. We introduce a novel Channel Drop Unit (CDU) and Token Drop Unit (TDU) architecture, coupled with layer-wise auxiliary domain classifiers to enable selective dropout of domain-sensitive features. A progressive regularization training strategy is further adopted to balance robustness and semantic information preservation. Our method achieves significant improvements over state-of-the-art approaches across multiple cross-domain AU benchmarks. Attention visualizations clearly localize activation regions for both single and compound AUs, demonstrating strong model interpretability and cross-domain generalization capability.
📝 Abstract
Facial Action Units (AUs) are essential for conveying psychological states and emotional expressions. While automatic AU detection systems leveraging deep learning have progressed, they often overfit to specific datasets and individual features, limiting their cross-domain applicability. To overcome these limitations, we propose a doubly adaptive dropout approach for cross-domain AU detection, which enhances the robustness of convolutional feature maps and spatial tokens against domain shifts. This approach includes a Channel Drop Unit (CD-Unit) and a Token Drop Unit (TD-Unit), which work together to reduce domain-specific noise at both the channel and token levels. The CD-Unit preserves domain-agnostic local patterns in feature maps, while the TD-Unit helps the model identify AU relationships generalizable across domains. An auxiliary domain classifier, integrated at each layer, guides the selective omission of domain-sensitive features. To prevent excessive feature dropout, a progressive training strategy is used, allowing for selective exclusion of sensitive features at any model layer. Our method consistently outperforms existing techniques in cross-domain AU detection, as demonstrated by extensive experimental evaluations. Visualizations of attention maps also highlight clear and meaningful patterns related to both individual and combined AUs, further validating the approach's effectiveness.