LoD: Loss-difference OOD Detection by Intentionally Label-Noisifying Unlabeled Wild Data

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Unsupervised out-of-distribution (OOD) detection on wild, unlabeled data—where in-distribution (ID) and OOD samples co-occur without annotations—remains challenging due to ID dominance bias and the absence of reliable separation criteria. Method: We propose a threshold-free OOD detection paradigm that actively injects controllable label noise to disrupt ID-dominated learning dynamics, thereby inducing naturally separable clusters of ID and OOD samples in the loss space. Our approach introduces the first “intentional label corruption” mechanism, integrated with loss-space modeling and K-means clustering, requiring neither pure OOD samples nor manually tuned thresholds. Contribution/Results: We theoretically establish that noise-induced loss discrepancies guarantee class-separability. Empirically, our method achieves up to 8.2% absolute improvement in OOD detection F1-score over state-of-the-art methods across multiple benchmarks, demonstrating strong robustness and eliminating the need for threshold optimization.

Technology Category

Application Category

📝 Abstract
Using unlabeled wild data containing both in-distribution (ID) and out-of-distribution (OOD) data to improve the safety and reliability of models has recently received increasing attention. Existing methods either design customized losses for labeled ID and unlabeled wild data then perform joint optimization, or first filter out OOD data from the latter then learn an OOD detector. While achieving varying degrees of success, two potential issues remain: (i) Labeled ID data typically dominates the learning of models, inevitably making models tend to fit OOD data as IDs; (ii) The selection of thresholds for identifying OOD data in unlabeled wild data usually faces dilemma due to the unavailability of pure OOD samples. To address these issues, we propose a novel loss-difference OOD detection framework (LoD) by extit{intentionally label-noisifying} unlabeled wild data. Such operations not only enable labeled ID data and OOD data in unlabeled wild data to jointly dominate the models' learning but also ensure the distinguishability of the losses between ID and OOD samples in unlabeled wild data, allowing the classic clustering technique (e.g., K-means) to filter these OOD samples without requiring thresholds any longer. We also provide theoretical foundation for LoD's viability, and extensive experiments verify its superiority.
Problem

Research questions and friction points this paper is trying to address.

Detecting OOD data in unlabeled wild datasets
Reducing model bias towards fitting OOD as ID data
Eliminating threshold dependency for OOD sample identification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Intentional label-noisifying unlabeled wild data
Loss-difference framework for OOD detection
Threshold-free clustering for OOD filtering
Chuanxing Geng
Chuanxing Geng
Nanjing University of Aeronautics and Astronautics
Machine LearningPattern Recognition
Q
Qifei Li
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics
X
Xinrui Wang
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics
D
Dong Liang
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics; MIIT Key Laboratory of Pattern Analysis and Machine Intelligence
Songcan Chen
Songcan Chen
Nanjing University of Aeronautics & Astronautics
Machine LearningPattern recognition
P
Pong C. Yuen
Department of Computer Science, Hong Kong Baptist University