🤖 AI Summary
This paper addresses the lack of 3D sequential structure in single-image semantic instance segmentation by proposing a novel paradigm that jointly models instance segmentation and occlusion-based relative depth ordering—bypassing unreliable monocular absolute depth estimation. We formally introduce the task of *Occlusion-Ordered Semantic Instance Segmentation* (OOSIS), unifying both objectives into a pixel-level annotation problem over directed occlusion boundaries. To this end, we design a learnable directed occlusion boundary detection module and propose an end-to-end differentiable framework for joint instance–order modeling. Furthermore, we introduce a composite evaluation metric that jointly assesses mask quality and occlusion consistency. Experiments on KINS and COCOA demonstrate substantial improvements over strong baselines, validating the robustness and effectiveness of relative occlusion ordering for 3D scene understanding.
📝 Abstract
Standard semantic instance segmentation provides useful, but inherently 2D information from a single image. To enable 3D analysis, one usually integrates absolute monocular depth estimation with instance segmentation. However, monocular depth is a difficult task. Instead, we leverage a simpler single-image task, occlusion-based relative depth ordering, providing coarser but useful 3D information. We show that relative depth ordering works more reliably from occlusions than from absolute depth. We propose to solve the joint task of relative depth ordering and segmentation of instances based on occlusions. We call this task Occlusion-Ordered Semantic Instance Segmentation (OOSIS). We develop an approach to OOSIS that extracts instances and their occlusion order simultaneously from oriented occlusion boundaries and semantic segmentation. Unlike popular detect-and-segment framework for instance segmentation, combining occlusion ordering with instance segmentation allows a simple and clean formulation of OOSIS as a labeling problem. As a part of our solution for OOSIS, we develop a novel oriented occlusion boundaries approach that significantly outperforms prior work. We also develop a new joint OOSIS metric based both on instance mask accuracy and correctness of their occlusion order. We achieve better performance than strong baselines on KINS and COCOA datasets.