🤖 AI Summary
This work challenges the prevailing assumption that higher in-distribution accuracy necessarily improves out-of-distribution (OOD) detection performance. By systematically evaluating 56 modern training strategies on a fixed ResNet-50 architecture trained on ImageNet, the study assesses model behavior across eight OOD datasets in conjunction with 21 post-hoc detection methods. The analysis reveals, for the first time, a non-monotonic relationship between training strategies and OOD detection efficacy: certain models achieving high in-distribution accuracy actually exhibit degraded OOD detection capabilities. These findings demonstrate that no single OOD detector is universally optimal; instead, detection performance critically depends on the synergistic design of both the training strategy and the detection method, offering new principles for robust deployment in open-world settings.
📝 Abstract
Out-of-distribution (OOD) detection is crucial for deploying robust and reliable machine-learning systems in open-world settings. Despite steady advances in OOD detectors, their interplay with modern training pipelines that maximize in-distribution (ID) accuracy and generalization remains under-explored. We investigate this link through a comprehensive empirical study. Fixing the architecture to the widely adopted ResNet-50, we benchmark 21 post-hoc, state-of-the-art OOD detection methods across 56 ImageNet-trained models obtained via diverse training strategies and evaluate them on eight OOD test sets. Contrary to the common assumption that higher ID accuracy implies better OOD detection performance, we uncover a non-monotonic relationship: OOD performance initially improves with accuracy but declines once advanced training recipes push accuracy beyond the baseline. Moreover, we observe a strong interdependence between training strategy, detector choice, and resulting OOD performance, indicating that no single method is universally optimal.