Predictive Sample Assignment for Semantically Coherent Out-of-Distribution Detection

📅 2025-12-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses semantic-consistent out-of-distribution detection (SCOOD), where models must robustly distinguish in-distribution (ID) from out-of-distribution (OOD) samples under partial ID supervision and abundant unlabeled mixed data (ID+OOD)—a setting prone to noise interference and poor ID/OOD separation. To this end, we propose the Prediction-based Sample Allocation (PSA) framework: (1) a dual-threshold ternary allocation mechanism (ID/OD/discard) for sample categorization; (2) a concept-contrastive representation learning loss to enhance semantic discriminability; and (3) an iterative retraining strategy for progressive dataset purification. Energy-score-driven sample filtering and dynamic discard significantly improve ID/OOD set purity. On two standard SCOOD benchmarks, PSA achieves state-of-the-art detection accuracy, with markedly superior ID/OOD separation quality and robustness compared to existing methods.

Technology Category

Application Category

📝 Abstract
Semantically coherent out-of-distribution detection (SCOOD) is a recently proposed realistic OOD detection setting: given labeled in-distribution (ID) data and mixed in-distribution and out-of-distribution unlabeled data as the training data, SCOOD aims to enable the trained model to accurately identify OOD samples in the testing data. Current SCOOD methods mainly adopt various clustering-based in-distribution sample filtering (IDF) strategies to select clean ID samples from unlabeled data, and take the remaining samples as auxiliary OOD data, which inevitably introduces a large number of noisy samples in training. To address the above issue, we propose a concise SCOOD framework based on predictive sample assignment (PSA). PSA includes a dual-threshold ternary sample assignment strategy based on the predictive energy score that can significantly improve the purity of the selected ID and OOD sample sets by assigning unconfident unlabeled data to an additional discard sample set, and a concept contrastive representation learning loss to further expand the distance between ID and OOD samples in the representation space to assist ID/OOD discrimination. In addition, we also introduce a retraining strategy to help the model fully fit the selected auxiliary ID/OOD samples. Experiments on two standard SCOOD benchmarks demonstrate that our approach outperforms the state-of-the-art methods by a significant margin.
Problem

Research questions and friction points this paper is trying to address.

Detects semantically coherent out-of-distribution samples in testing data
Reduces noisy samples in training by filtering unlabeled data
Improves purity of selected ID and OOD sample sets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Predictive energy score dual-threshold ternary assignment
Concept contrastive representation learning loss
Retraining strategy for auxiliary sample fitting
🔎 Similar Papers
No similar papers found.