🤖 AI Summary
This study investigates whether facial expression dynamics alone—devoid of static appearance cues—encode discriminative identity information. To isolate dynamic identity signals, we propose a FLAME-based 3D morphable model that disentangles expression and jaw motion parameters from identity-related shape parameters. We introduce the drift-noise ratio (DNR) to quantify disentanglement reliability and empirically demonstrate its strong negative correlation with downstream identity recognition performance. Building upon this, we design a Conformer-based architecture trained via supervised contrastive learning, achieving 61.14% top-1 accuracy on 1,429 identities in the CANDOR dataset—significantly surpassing the random baseline (0.13%). This work provides the first systematic evidence that pure facial dynamics exhibit strong individual specificity, establishing a novel paradigm for texture-free, low-resolution identity recognition.
📝 Abstract
This work investigates whether individuals can be identified solely through the pure dynamical components of their facial expressions, independent of static facial appearance. We leverage the FLAME 3D morphable model to achieve explicit disentanglement between facial shape and expression dynamics, extracting frame-by-frame parameters from conversational videos while retaining only expression and jaw coefficients. On the CANDOR dataset of 1,429 speakers in naturalistic conversations, our Conformer model with supervised contrastive learning achieves 61.14%accuracy on 1,429-way classification -- 458 times above chance -- demonstrating that facial dynamics carry strong identity signatures. We introduce a drift-to-noise ratio (DNR) that quantifies the reliability of shape expression separation by measuring across-session shape changes relative to within-session variability. DNR strongly negatively correlates with recognition performance, confirming that unstable shape estimation compromises dynamic identification. Our findings reveal person-specific signatures in conversational facial dynamics, with implications for social perception and clinical assessment.