🤖 AI Summary
This work addresses the vulnerability of conventional open-set recognition (OSR) familiarity scoring methods—such as Maximum Softmax Probability (MSP) and Maximum Logit Score (MLS)—under adversarial attacks. We systematically identify and characterize two critical threat classes: “false familiarity” and “false novelty” attacks, which severely degrade OSR reliability. To mitigate this, we propose the Adversarial Response Score (ARS), a novel OSR scoring criterion grounded in gradient-based model response modeling. ARS exhibits strong correlation with MLS yet delivers substantially improved adversarial robustness. Empirical evaluation on TinyImageNet, using gradient-based class-targeted adversarial attacks and rigorous statistical validation, demonstrates that standard familiarity metrics fail catastrophically under attack. In contrast, ARS achieves an average 12.3% improvement in AUC over MLS under false novelty attacks, significantly enhancing OSR’s dependable discrimination of unknown classes.
📝 Abstract
Open-set recognition (OSR), the identification of novel categories, can be a critical component when deploying classification models in real-world applications. Recent work has shown that familiarity-based scoring rules such as the Maximum Softmax Probability (MSP) or the Maximum Logit Score (MLS) are strong baselines when the closed-set accuracy is high. However, one of the potential weaknesses of familiarity-based OSR are adversarial attacks. Here, we study gradient-based adversarial attacks on familiarity scores for both types of attacks, False Familiarity and False Novelty attacks, and evaluate their effectiveness in informed and uninformed settings on TinyImageNet. Furthermore, we explore how novel and familiar samples react to adversarial attacks and formulate the adversarial reaction score as an alternative OSR scoring rule, which shows a high correlation with the MLS familiarity score.