Predictive Sample Assignment for Semantically Coherent Out-of-Distribution Detection

📅 2025-12-14

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This paper addresses semantic-consistent out-of-distribution detection (SCOOD), where models must robustly distinguish in-distribution (ID) from out-of-distribution (OOD) samples under partial ID supervision and abundant unlabeled mixed data (ID+OOD)—a setting prone to noise interference and poor ID/OOD separation. To this end, we propose the Prediction-based Sample Allocation (PSA) framework: (1) a dual-threshold ternary allocation mechanism (ID/OD/discard) for sample categorization; (2) a concept-contrastive representation learning loss to enhance semantic discriminability; and (3) an iterative retraining strategy for progressive dataset purification. Energy-score-driven sample filtering and dynamic discard significantly improve ID/OOD set purity. On two standard SCOOD benchmarks, PSA achieves state-of-the-art detection accuracy, with markedly superior ID/OOD separation quality and robustness compared to existing methods.

Technology Category

Application Category

📝 Abstract

Semantically coherent out-of-distribution detection (SCOOD) is a recently proposed realistic OOD detection setting: given labeled in-distribution (ID) data and mixed in-distribution and out-of-distribution unlabeled data as the training data, SCOOD aims to enable the trained model to accurately identify OOD samples in the testing data. Current SCOOD methods mainly adopt various clustering-based in-distribution sample filtering (IDF) strategies to select clean ID samples from unlabeled data, and take the remaining samples as auxiliary OOD data, which inevitably introduces a large number of noisy samples in training. To address the above issue, we propose a concise SCOOD framework based on predictive sample assignment (PSA). PSA includes a dual-threshold ternary sample assignment strategy based on the predictive energy score that can significantly improve the purity of the selected ID and OOD sample sets by assigning unconfident unlabeled data to an additional discard sample set, and a concept contrastive representation learning loss to further expand the distance between ID and OOD samples in the representation space to assist ID/OOD discrimination. In addition, we also introduce a retraining strategy to help the model fully fit the selected auxiliary ID/OOD samples. Experiments on two standard SCOOD benchmarks demonstrate that our approach outperforms the state-of-the-art methods by a significant margin.

Problem

Research questions and friction points this paper is trying to address.

Detects semantically coherent out-of-distribution samples in testing data

Reduces noisy samples in training by filtering unlabeled data

Improves purity of selected ID and OOD sample sets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Predictive energy score dual-threshold ternary assignment

Concept contrastive representation learning loss

Retraining strategy for auxiliary sample fitting

🔎 Similar Papers

No similar papers found.