🤖 AI Summary
To address the lack of interpretability and user trust hindering trustworthy deployment of facial expression recognition (FER) in real-world scenarios, this paper introduces, for the first time, a multimodal eXplainable AI (XAI) framework grounded in Facial Action Units (FAUs). Our method comprises three components: FAU-based model representation, cross-modal explanation generation (integrating textual descriptions and visual saliency heatmaps), and a user study–driven trust calibration framework. Empirical evaluation demonstrates that FAU-informed explanations significantly enhance users’ comprehension of model decisions and effectively calibrate trust across all modalities—mitigating both over-trust and under-trust. This work establishes a novel paradigm and empirical benchmark for XAI in FER, advancing both theoretical foundations and practical deployment guidelines for trustworthy affective computing systems.
📝 Abstract
Facial expression recognition (FER) has emerged as a promising approach to the development of emotion-aware intelligent agents and systems. However, key challenges remain in utilizing FER in real-world contexts, including ensuring user understanding and establishing a suitable level of user trust. We developed a novel explanation method utilizing Facial Action Units (FAUs) to explain the output of a FER model through both textual and visual modalities. We conducted an empirical user study evaluating user understanding and trust, comparing our approach to state-of-the-art eXplainable AI (XAI) methods. Our results indicate that visual AND textual as well as textual-only FAU-based explanations resulted in better user understanding of the FER model. We also show that all modalities of FAU-based methods improved appropriate trust of the users towards the FER model.