🤖 AI Summary
This work addresses the limitations of existing out-of-distribution (OOD) detection methods, which often overlook semantic context in images, struggle to identify near-OOD samples, and are susceptible to simplicity bias. To overcome these issues, the authors propose an object-centric OOD detection framework that introduces, for the first time, an object co-occurrence (OCO) mechanism. By leveraging disentangled representation learning, the framework models semantic co-occurrence relationships among objects and adaptively partitions test samples into three distinct scenarios, applying a divide-and-conquer strategy for discrimination. This approach effectively reduces reliance on low-level features and substantially enhances near-OOD detection performance. Extensive experiments demonstrate state-of-the-art results across multiple challenging OOD benchmarks, highlighting the method’s strong robustness to both semantic shift and covariate shift.
📝 Abstract
Out-of-distribution (OOD) detection is crucial for ensuring the reliability of deep learning models. Existing methods mostly focus on regular entangled representations to discriminate in-distribution (ID) and OOD data, neglecting the rich contextual information within images. This issue is particularly challenging for detecting near-OOD, as models with simplicity bias struggle to learn discriminative features in disentangled representations. The human visual system can use the co-occurrence of objects in the natural environment to facilitate scene understanding. Inspired by this, we propose an Object-Centric OOD detection framework that learns to capture Object CO-occurrence (OCO) patterns within images. The proposed method introduces a new OOD detection paradigm that understands object co-occurrence within an image by predicting disentangled representations for the test sample, then adaptively divides patterns into three scenarios based on object co-occurrence patterns observed in ID training data, and finally performs OOD detection in a divide-and-conquer manner. By doing so, OCO can distinguish near-OOD by considering the semantic contextual relationships present in their images, avoiding the tendency to focus solely on simple, easily learnable regions. We evaluate OCO through experiments across challenging and full-spectrum OOD settings, demonstrating competitive results and confirming its ability to address both semantic and covariate shifts. Code is released at https://github.com/Michael-McQueen/OCO.