🤖 AI Summary
This study investigates whether classical machine learning models driven by handcrafted features can resist transfer-based adversarial attacks generated by deep neural networks through feature engineering. On CIFAR-10, the authors systematically evaluate the transferability of FGSM and PGD attacks against KNN, decision trees, linear/kernel SVMs, and shallow neural networks using HOG features combined with various block normalization schemes. The work reveals, for the first time, that adversarial vulnerability is pervasive across computational paradigms: all classical models suffer accuracy drops of 16.6%–59.1%, comparable to those observed in deep networks. Notably, an “attack-level reversal” phenomenon emerges—FGSM proves more destructive than PGD—challenging prevailing assumptions. These findings suggest that adversarial fragility is an intrinsic property of image classification systems, unlikely to be fundamentally mitigated by feature engineering alone.
📝 Abstract
Deep neural networks are vulnerable to adversarial examples--inputs with imperceptible perturbations causing misclassification. While adversarial transfer within neural networks is well-documented, whether classical ML pipelines using handcrafted features inherit this vulnerability when attacked via neural surrogates remains unexplored. Feature engineering creates information bottlenecks through gradient quantization and spatial binning, potentially filtering high-frequency adversarial signals. We evaluate this hypothesis through the first comprehensive study of adversarial transfer from DNNs to HOG-based classifiers. Using VGG16 as a surrogate, we generate FGSM and PGD adversarial examples and test transfer to four classical classifiers (KNN, Decision Tree, Linear SVM, Kernel SVM) and a shallow neural network across eight HOG configurations on CIFAR-10. Our results strongly refute the protective hypothesis: all classifiers suffer 16.6%-59.1% relative accuracy drops, comparable to neural-to-neural transfer. More surprisingly, we discover attack hierarchy reversal--contrary to patterns where iterative PGD dominates FGSM within neural networks, FGSM causes greater degradation than PGD in 100% of classical ML cases, suggesting iterative attacks overfit to surrogate-specific features that don't survive feature extraction. Block normalization provides partial but insufficient mitigation. These findings demonstrate that adversarial vulnerability is not an artifact of end-to-end differentiability but a fundamental property of image classification systems, with implications for security-critical deployments across computational paradigms.