🤖 AI Summary
This work addresses zero-shot out-of-distribution (OoD) region detection for semantic segmentation—without any training, fine-tuning, or OoD annotations. The proposed method leverages deep features from a pretrained InternImage-L backbone, models in-distribution feature structure via unsupervised K-Means clustering, and adaptively filters segmentation outputs using confidence scores from the decoder head. By exploiting inherent discriminative capabilities of general-purpose vision foundation models, it establishes the first systematic evidence that such models intrinsically encode OoD detectability. Evaluated on RoadAnomaly and ADE-OoD benchmarks, the approach achieves mean precision of 50.02% and 48.77%, respectively—substantially surpassing both supervised and unsupervised state-of-the-art baselines. This work introduces a lightweight, generalizable, and deployment-ready paradigm for OoD segmentation, eliminating reliance on task-specific training or external supervision.
📝 Abstract
Detecting unknown objects in semantic segmentation is crucial for safety-critical applications such as autonomous driving. Large vision foundation models, includ- ing DINOv2, InternImage, and CLIP, have advanced visual representation learn- ing by providing rich features that generalize well across diverse tasks. While their strength in closed-set semantic tasks is established, their capability to detect out- of-distribution (OoD) regions in semantic segmentation remains underexplored. In this work, we investigate whether foundation models fine-tuned on segmen- tation datasets can inherently distinguish in-distribution (ID) from OoD regions without any outlier supervision. We propose a simple, training-free approach that utilizes features from the InternImage backbone and applies K-Means clustering alongside confidence thresholding on raw decoder logits to identify OoD clusters. Our method achieves 50.02 Average Precision on the RoadAnomaly benchmark and 48.77 on the benchmark of ADE-OoD with InternImage-L, surpassing several supervised and unsupervised baselines. These results suggest a promising direc- tion for generic OoD segmentation methods that require minimal assumptions or additional data.