🤖 AI Summary
Existing mask-based out-of-distribution (OoD) segmentation methods suffer from blurred boundaries, inconsistent intra-object anomaly scoring, and background false positives—limitations that hinder their deployment in safety-critical applications such as autonomous driving. To address these issues, we propose a fine-grained inpainting framework integrating object-level priors: first, leveraging the Segment Anything Model (SAM) to generate instance masks for object-aware anomaly score calibration; second, jointly enhancing boundary sharpness and structural consistency via Laplacian edge enhancement and Gaussian smoothing. Building upon pre-trained OoD backbone outputs, our method performs multi-stage score recalibration and contour refinement to significantly improve detection reliability. On the SMIYC and RoadAnomaly benchmarks, it achieves a pixel-wise area under the precision-recall curve (AuPRC) of 96.99%, a false positive rate at 95% recall (FPR₉₅) of 0.07, and a component-level F1-score of 83.44%, consistently outperforming state-of-the-art approaches.
📝 Abstract
Out-of-Distribution (OoD) segmentation is critical for safety-sensitive applications like autonomous driving. However, existing mask-based methods often suffer from boundary imprecision, inconsistent anomaly scores within objects, and false positives from background noise. We propose extbf{ extit{Objectomaly}}, an objectness-aware refinement framework that incorporates object-level priors. Objectomaly consists of three stages: (1) Coarse Anomaly Scoring (CAS) using an existing OoD backbone, (2) Objectness-Aware Score Calibration (OASC) leveraging SAM-generated instance masks for object-level score normalization, and (3) Meticulous Boundary Precision (MBP) applying Laplacian filtering and Gaussian smoothing for contour refinement. Objectomaly achieves state-of-the-art performance on key OoD segmentation benchmarks, including SMIYC AnomalyTrack/ObstacleTrack and RoadAnomaly, improving both pixel-level (AuPRC up to 96.99, FPR$_{95}$ down to 0.07) and component-level (F1$-$score up to 83.44) metrics. Ablation studies and qualitative results on real-world driving videos further validate the robustness and generalizability of our method. Code will be released upon publication.