🤖 AI Summary
This work addresses two core challenges in secure machine learning: out-of-distribution (OOD) detection and uncertainty estimation (UE). We propose a unified “train–invert–exclude” closed-loop framework. It employs an (n+1)-class classifier augmented with an explicit “junk class,” coupled with Gaussian-noise initialization and iterative network inversion for input reconstruction. Crucially, the method operates without external OOD data or post-hoc calibration, dynamically purifying class manifolds while explicitly modeling anomalous inputs. The framework jointly achieves high-accuracy OOD detection and well-calibrated uncertainty quantification. On standard benchmarks, it significantly improves key metrics—including OOD detection AUC and expected calibration error (ECE)—while ensuring interpretability, end-to-end trainability, and model-agnostic extensibility.
📝 Abstract
Out-of-distribution (OOD) detection and uncertainty estimation (UE) are critical components for building safe machine learning systems, especially in real-world scenarios where unexpected inputs are inevitable. In this work, we propose a novel framework that combines network inversion with classifier training to simultaneously address both OOD detection and uncertainty estimation. For a standard n-class classification task, we extend the classifier to an (n+1)-class model by introducing a"garbage"class, initially populated with random gaussian noise to represent outlier inputs. After each training epoch, we use network inversion to reconstruct input images corresponding to all output classes that initially appear as noisy and incoherent and are therefore excluded to the garbage class for retraining the classifier. This cycle of training, inversion, and exclusion continues iteratively till the inverted samples begin to resemble the in-distribution data more closely, suggesting that the classifier has learned to carve out meaningful decision boundaries while sanitising the class manifolds by pushing OOD content into the garbage class. During inference, this training scheme enables the model to effectively detect and reject OOD samples by classifying them into the garbage class. Furthermore, the confidence scores associated with each prediction can be used to estimate uncertainty for both in-distribution and OOD inputs. Our approach is scalable, interpretable, and does not require access to external OOD datasets or post-hoc calibration techniques while providing a unified solution to the dual challenges of OOD detection and uncertainty estimation.