🤖 AI Summary
This paper addresses a fairness deficiency in large-scale evaluation methods for open-set recognition (OSR): the neglect of inter-class performance disparities leads to systematic misclassification and bias toward unknown classes. To remedy this, we propose the first hyperparameter-free Gaussian modeling framework for OSR. Our method operates in the logits space, where (1) Z-score normalization suppresses interference from anomalous feature magnitudes; (2) class-conditional diagonal Gaussian models capture known-class distributions; and (3) a multi-network transfer evaluation paradigm enhances generalization. Evaluated on ImageNet-1K backbones and four diverse unknown-class datasets, our approach achieves state-of-the-art performance across all three key metrics—AUOSCR, AUROC, and FPR95—with statistically significant improvements (p < 0.01) over all baselines. The framework eliminates reliance on task-specific hyperparameters while improving both calibration and discrimination between known and unknown classes.
📝 Abstract
Evaluations of large-scale recognition methods typically focus on overall performance. While this approach is common, it often fails to provide insights into performance across individual classes, which can lead to fairness issues and misrepresentation. Addressing these gaps is crucial for accurately assessing how well methods handle novel or unseen classes and ensuring a fair evaluation. To address fairness in Open-Set Recognition (OSR), we demonstrate that per-class performance can vary dramatically. We introduce Gaussian Hypothesis Open Set Technique (GHOST), a novel hyperparameter-free algorithm that models deep features using class-wise multivariate Gaussian distributions with diagonal covariance matrices. We apply Z-score normalization to logits to mitigate the impact of feature magnitudes that deviate from the model's expectations, thereby reducing the likelihood of the network assigning a high score to an unknown sample. We evaluate GHOST across multiple ImageNet-1K pre-trained deep networks and test it with four different unknown datasets. Using standard metrics such as AUOSCR, AUROC and FPR95, we achieve statistically significant improvements, advancing the state-of-the-art in large-scale OSR. Source code is provided online.