🤖 AI Summary
This work addresses adversarial perturbation detection during real-time inference under label-free conditions. We propose Volatility in Certainty (VC), a novel, label-agnostic confidence anomaly metric that quantifies local degradation in output smoothness by computing the mean squared logarithmic ratio of adjacent confidence scores in the sorted softmax output. VC is architecture-agnostic and computationally lightweight, enabling the first unsupervised, online monitoring of adversarial drift. Experiments on MNIST and CIFAR-10 demonstrate that log(VC) exhibits strong negative correlation with classification accuracy (Spearman ρ < −0.90), effectively serving as an early warning signal for model performance degradation. The method thus supports timely defensive interventions in safety-critical systems without requiring ground-truth labels or model retraining.
📝 Abstract
Adversarial robustness remains a critical challenge in deploying neural network classifiers, particularly in real-time systems where ground-truth labels are unavailable during inference. This paper investigates extit{Volatility in Certainty} (VC), a recently proposed, label-free metric that quantifies irregularities in model confidence by measuring the dispersion of sorted softmax outputs. Specifically, VC is defined as the average squared log-ratio of adjacent certainty values, capturing local fluctuations in model output smoothness. We evaluate VC as a proxy for classification accuracy and as an indicator of adversarial drift. Experiments are conducted on artificial neural networks (ANNs) and convolutional neural networks (CNNs) trained on MNIST, as well as a regularized VGG-like model trained on CIFAR-10. Adversarial examples are generated using the Fast Gradient Sign Method (FGSM) across varying perturbation magnitudes. In addition, mixed test sets are created by gradually introducing adversarial contamination to assess VC's sensitivity under incremental distribution shifts. Our results reveal a strong negative correlation between classification accuracy and log(VC) (correlation rho < -0.90 in most cases), suggesting that VC effectively reflects performance degradation without requiring labeled data. These findings position VC as a scalable, architecture-agnostic, and real-time performance metric suitable for early-warning systems in safety-critical applications.