🤖 AI Summary
This work addresses the challenge that out-of-distribution (OOD) objects in street scenes are frequently misclassified as background by standard object detectors. To tackle this issue, the authors propose SynOE-OD, a novel framework that, for the first time, integrates synthetic anomaly exposure with transfer learning to enable unified detection of both in-distribution (ID) and OOD objects within a single-stage detector. The method leverages Stable Diffusion to generate semantically plausible OOD samples and employs an open-vocabulary detector—such as GroundingDINO—for anomaly exposure training, without requiring any additional auxiliary architectures. Evaluated on established street-scene OOD detection benchmarks, SynOE-OD substantially outperforms the zero-shot performance of existing open-vocabulary detectors and achieves state-of-the-art average precision.
📝 Abstract
Out-of-distribution (OOD) object detection is an important yet underexplored task. A reliable object detector should be able to handle OOD objects by localizing and correctly classifying them as OOD. However, a critical issue arises when such atypical objects are completely missed by the object detector and incorrectly treated as background. Existing OOD detection approaches in object detection often rely on complex architectures or auxiliary branches and typically do not provide a framework that treats in-distribution (ID) and OOD in a unified way. In this work, we address these limitations by enabling a single detector to detect OOD objects, that are otherwise silently overlooked, alongside ID objects. We present \textbf{SynOE-OD}, a \textbf{Syn}thetic \textbf{O}utlier-\textbf{E}xposure-based \textbf{O}bject \textbf{D}etection framework, that leverages strong generative models, like Stable Diffusion, and Open-Vocabulary Object Detectors (OVODs) to generate semantically meaningful, object-level data that serve as outliers during training. The generated data is used for transfer-learning to establish strong ID task performance and supplement detection models with OOD object detection robustness. Our approach achieves state-of-the-art average precision on an established OOD object detection benchmark, where OVODs, such as GroundingDINO, show limited zero-shot performance in detecting OOD objects in street-scenes.