🤖 AI Summary
Unsupervised out-of-distribution (OOD) detection on wild, unlabeled data—where in-distribution (ID) and OOD samples co-occur without annotations—remains challenging due to ID dominance bias and the absence of reliable separation criteria. Method: We propose a threshold-free OOD detection paradigm that actively injects controllable label noise to disrupt ID-dominated learning dynamics, thereby inducing naturally separable clusters of ID and OOD samples in the loss space. Our approach introduces the first “intentional label corruption” mechanism, integrated with loss-space modeling and K-means clustering, requiring neither pure OOD samples nor manually tuned thresholds. Contribution/Results: We theoretically establish that noise-induced loss discrepancies guarantee class-separability. Empirically, our method achieves up to 8.2% absolute improvement in OOD detection F1-score over state-of-the-art methods across multiple benchmarks, demonstrating strong robustness and eliminating the need for threshold optimization.
📝 Abstract
Using unlabeled wild data containing both in-distribution (ID) and out-of-distribution (OOD) data to improve the safety and reliability of models has recently received increasing attention. Existing methods either design customized losses for labeled ID and unlabeled wild data then perform joint optimization, or first filter out OOD data from the latter then learn an OOD detector. While achieving varying degrees of success, two potential issues remain: (i) Labeled ID data typically dominates the learning of models, inevitably making models tend to fit OOD data as IDs; (ii) The selection of thresholds for identifying OOD data in unlabeled wild data usually faces dilemma due to the unavailability of pure OOD samples. To address these issues, we propose a novel loss-difference OOD detection framework (LoD) by extit{intentionally label-noisifying} unlabeled wild data. Such operations not only enable labeled ID data and OOD data in unlabeled wild data to jointly dominate the models' learning but also ensure the distinguishability of the losses between ID and OOD samples in unlabeled wild data, allowing the classic clustering technique (e.g., K-means) to filter these OOD samples without requiring thresholds any longer. We also provide theoretical foundation for LoD's viability, and extensive experiments verify its superiority.