🤖 AI Summary
This work addresses the challenge of inconsistent panoptic segmentation in multi-view operating room scenes, which commonly arises from sparse viewpoints and severe occlusions, thereby impairing spatial awareness and intraoperative understanding. The authors propose a novel deep architecture that natively enforces multi-view consistency by embedding cross-view feature interaction and consistency constraints directly within a single forward pass of the backbone network. Notably, the method operates without requiring camera calibration, post-processing, or prior knowledge of view configurations, achieving the first calibration-free multi-view consistent panoptic segmentation. Evaluated on the MM-OR and 4D-OR datasets, the approach attains over 70% Panoptic Quality (PQ), substantially outperforming current state-of-the-art methods.
📝 Abstract
Operating rooms (ORs) are cluttered, dynamic, highly occluded environments, where reliable spatial understanding is essential for situational awareness during complex surgical workflows. Achieving spatial understanding for panoptic segmentation from sparse multiview images poses a fundamental challenge, as limited visibility in a subset of views often leads to mispredictions across cameras. To this end, we introduce PanORama, the first panoptic segmentation for the operating room that is multiview-consistent by design. By modeling cross-view interactions at the feature level inside the backbone in a single forward pass, view consistency emerges directly rather than through post-hoc refinement. We evaluate on the MM-OR and 4D-OR datasets, achieving >70% Panoptic Quality (PQ) performance, and outperforming the previous state of the art. Importantly, PanORama is calibration-free, requiring no camera parameters, and generalizes to unseen camera viewpoints within any multiview configuration at inference time. By substantially enhancing multiview segmentation and, consequently, spatial understanding in the OR, we believe our approach opens new opportunities for surgical perception and assistance. Code will be released upon acceptance.