🤖 AI Summary
This work addresses the poor interpretability and high computational cost of deep learning models in image classification. We propose a lightweight, interpretable classification framework based on handcrafted feature fusion. Specifically, we extend one-dimensional permutation entropy (PE) to two-dimensional images, constructing multi-scale and multi-directional entropy features; these are then complementarily fused with Histogram of Oriented Gradients (HOG) and Local Binary Patterns (LBP) to form a highly discriminative feature set. Classification is performed using a Support Vector Machine (SVM) optimized via grid search for hyperparameter tuning. Evaluated on four benchmark datasets—Fashion-MNIST, KMNIST, EMNIST, and CIFAR-10—the method achieves classification accuracy comparable to or better than state-of-the-art lightweight CNNs, while incurring significantly lower computational overhead. Results demonstrate that entropy-driven feature engineering effectively balances interpretability, computational efficiency, and generalization capability.
📝 Abstract
Feature engineering continues to play a critical role in image classification, particularly when interpretability and computational efficiency are prioritized over deep learning models with millions of parameters. In this study, we revisit classical machine learning based image classification through a novel approach centered on Permutation Entropy (PE), a robust and computationally lightweight measure traditionally used in time series analysis but rarely applied to image data. We extend PE to two-dimensional images and propose a multiscale, multi-orientation entropy-based feature extraction approach that characterizes spatial order and complexity along rows, columns, diagonals, anti-diagonals, and local patches of the image. To enhance the discriminatory power of the entropy features, we integrate two classic image descriptors: the Histogram of Oriented Gradients (HOG) to capture shape and edge structure, and Local Binary Patterns (LBP) to encode micro-texture of an image. The resulting hand-crafted feature set, comprising of 780 dimensions, is used to train Support Vector Machine (SVM) classifiers optimized through grid search. The proposed approach is evaluated on multiple benchmark datasets, including Fashion-MNIST, KMNIST, EMNIST, and CIFAR-10, where it delivers competitive classification performance without relying on deep architectures. Our results demonstrate that the fusion of PE with HOG and LBP provides a compact, interpretable, and effective alternative to computationally expensive and limited interpretable deep learning models. This shows a potential of entropy-based descriptors in image classification and contributes a lightweight and generalizable solution to interpretable machine learning in image classification and computer vision.