🤖 AI Summary
To address performance degradation in cross-domain facial Action Unit (AU) detection caused by domain shift, this paper proposes the Decoupled Dual-Contrastive Adaptation (D2CA) framework. Methodologically, D2CA introduces the first automatic disentanglement mechanism separating AU-specific and domain-specific factors, thereby partitioning feature subspaces into AU-relevant and AU-irrelevant components; it further integrates image-level and feature-level dual contrastive learning to achieve semantic alignment and scale-controllable cross-domain face synthesis. In terms of contributions, D2CA is the first to jointly model feature disentanglement, dual contrastive learning, and AU-conditioned domain adaptation. Extensive experiments across multiple cross-domain settings demonstrate an average F1-score improvement of 6–14% over state-of-the-art methods. Moreover, the synthesized faces exhibit both high visual fidelity and strong AU semantic preservation.
📝 Abstract
Despite the impressive performance of current vision-based facial action unit (AU) detection approaches, they are heavily susceptible to the variations across different domains and the cross-domain AU detection methods are under-explored. In response to this challenge, we propose a decoupled doubly contrastive adaptation (D2CA) approach to learn a purified AU representation that is semantically aligned for the source and target domains. Specifically, we decompose latent representations into AU-relevant and AU-irrelevant components, with the objective of exclusively facilitating adaptation within the AU-relevant subspace. To achieve the feature decoupling, D2CA is trained to disentangle AU and domain factors by assessing the quality of synthesized faces in cross-domain scenarios when either AU or domain attributes are modified. To further strengthen feature decoupling, particularly in scenarios with limited AU data diversity, D2CA employs a doubly contrastive learning mechanism comprising image and feature-level contrastive learning to ensure the quality of synthesized faces and mitigate feature ambiguities. This new framework leads to an automatically learned, dedicated separation of AU-relevant and domain-relevant factors, and it enables intuitive, scale-specific control of the cross-domain facial image synthesis. Extensive experiments demonstrate the efficacy of D2CA in successfully decoupling AU and domain factors, yielding visually pleasing cross-domain synthesized facial images. Meanwhile, D2CA consistently outperforms state-of-the-art cross-domain AU detection approaches, achieving an average F1 score improvement of 6%-14% across various cross-domain scenarios.