🤖 AI Summary
This work addresses the limitations in open-set panoptic segmentation, where existing methods often neglect the semantic hierarchy of known categories and struggle to effectively identify unknown objects. To overcome these challenges, the authors propose Hyp2Former, a novel framework that, for the first time, incorporates hierarchy-aware hyperbolic embeddings into this task. By continuously modeling semantic hierarchical structures in hyperbolic space, Hyp2Former constructs a structured embedding space that enhances the recognition of unknown instances without explicitly modeling unknown classes. The approach seamlessly integrates hierarchical semantic priors with an end-to-end panoptic segmentation architecture, achieving state-of-the-art performance on benchmark datasets including MS COCO, Cityscapes, and Lost&Found, while significantly balancing high segmentation accuracy for known categories with robust discovery capability for unknown objects.
📝 Abstract
Recognizing unknown objects is crucial for safety-critical applications such as autonomous driving and robotics. Open-Set Panoptic Segmentation (OPS) aims to segment known thing and stuff classes while identifying valid unknown objects as separate instances. Prior OPS approaches largely treat known categories as a flat label set, ignoring the semantic hierarchy that provides valuable structural priors for distinguishing unknown objects from in-distribution classes. In this work, we propose Hyp2Former, an end-to-end framework for OPS that does not require explicit modeling of unknowns during training, and instead learns hierarchical semantic similarities continuously in hyperbolic space. By explicitly encoding hierarchical relationships among known categories, the model learns a structured embedding space that captures multiple levels of semantic abstraction. As a result, unknown objects that cannot be confidently classified as known categories still remain in close proximity to higher-level concepts (e.g., an unknown animal remains closer to "animal" or "object" than to unrelated concepts such as "electronics" or "stuff") and can therefore be reliably detected, even if their fine-grained category was not represented during training. Empirical evaluations across multiple public datasets such as MS COCO, Cityscapes, and Lost&Found demonstrate that Hyp2Former outperforms existing methods on OPS, achieving the best balance between unknown object discovery and in-distribution robustness.