🤖 AI Summary
Facial expression classification faces challenges including difficulty in modeling high-dimensional image features and insufficient discriminability. This paper proposes Hy-Facial, the first framework to systematically integrate deep features from VGG19 with handcrafted SIFT and ORB features. To preserve multi-scale structural information, it introduces a joint dimensionality reduction strategy combining K-means pre-clustering and UMAP. Unlike PCA or t-SNE, UMAP achieves a superior balance between local neighborhood preservation and global manifold structure modeling, thereby significantly enhancing feature separability. Evaluated on the FER-2013 benchmark, Hy-Facial achieves 83.3% classification accuracy—demonstrating the synergistic advantage of hybrid feature representation and UMAP-driven dimensionality reduction. The framework establishes a novel paradigm for lightweight, efficient facial expression recognition, offering improved performance without excessive computational overhead.
📝 Abstract
Facial expression classification remains a challenging task due to the high dimensionality and inherent complexity of facial image data. This paper presents Hy-Facial, a hybrid feature extraction framework that integrates both deep learning and traditional image processing techniques, complemented by a systematic investigation of dimensionality reduction strategies. The proposed method fuses deep features extracted from the Visual Geometry Group 19-layer network (VGG19) with handcrafted local descriptors and the scale-invariant feature transform (SIFT) and Oriented FAST and Rotated BRIEF (ORB) algorithms, to obtain rich and diverse image representations. To mitigate feature redundancy and reduce computational complexity, we conduct a comprehensive evaluation of dimensionality reduction techniques and feature extraction. Among these, UMAP is identified as the most effective, preserving both local and global structures of the high-dimensional feature space. The Hy-Facial pipeline integrated VGG19, SIFT, and ORB for feature extraction, followed by K-means clustering and UMAP for dimensionality reduction, resulting in a classification accuracy of 83. 3% in the facial expression recognition (FER) dataset. These findings underscore the pivotal role of dimensionality reduction not only as a pre-processing step but as an essential component in improving feature quality and overall classification performance.