🤖 AI Summary
This study addresses the fine-grained discrimination between genuine and posed smiles—a critical challenge in affective computing. We propose a physiology-inspired multimodal fusion framework that jointly leverages handcrafted D-Marker–based features and Transformer-derived deep representations. Crucially, we introduce a parameter-free Hadamard multiplicative fusion mechanism to enable efficient and interpretable interaction between these complementary feature modalities. This design avoids introducing additional learnable parameters, reducing model complexity by 26% while enhancing discriminative capacity. Extensive evaluations on four benchmark datasets—UvA-NEMO, MMI, SPOS, and BBC—achieve state-of-the-art accuracies of 88.7%, 99.7%, 98.5%, and 100%, respectively. These results comprehensively surpass existing methods and empirically validate the efficacy of integrating physiological priors with deep learning for robust smile authentication.
📝 Abstract
The distinction between genuine and posed emotions represents a fundamental pattern recognition challenge with significant implications for data mining applications in social sciences, healthcare, and human-computer interaction. While recent multi-task learning frameworks have shown promise in combining deep learning architectures with handcrafted D-Marker features for smile facial emotion recognition, these approaches exhibit computational inefficiencies due to auxiliary task supervision and complex loss balancing requirements. This paper introduces HadaSmileNet, a novel feature fusion framework that directly integrates transformer-based representations with physiologically grounded D-Markers through parameter-free multiplicative interactions. Through systematic evaluation of 15 fusion strategies, we demonstrate that Hadamard multiplicative fusion achieves optimal performance by enabling direct feature interactions while maintaining computational efficiency. The proposed approach establishes new state-of-the-art results for deep learning methods across four benchmark datasets: UvA-NEMO (88.7 percent, +0.8), MMI (99.7 percent), SPOS (98.5 percent, +0.7), and BBC (100 percent, +5.0). Comprehensive computational analysis reveals 26 percent parameter reduction and simplified training compared to multi-task alternatives, while feature visualization demonstrates enhanced discriminative power through direct domain knowledge integration. The framework's efficiency and effectiveness make it particularly suitable for practical deployment in multimedia data mining applications that require real-time affective computing capabilities.