Prior2Former -- Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation

📅 2025-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Open-world panoptic segmentation is constrained by predefined categories, limiting reliable identification of unknown classes and out-of-distribution (OOD) data—hindering deployment in safety-critical applications such as autonomous driving. To address this, we propose the first framework integrating evidential deep learning into mask Transformers. Our method introduces a learnable Beta prior to enable hypothesis-agnostic, pixel-wise uncertainty quantification over binary instance masks, without requiring OOD samples, empty-class supervision, or contrastive training. It performs end-to-end anomaly instance detection and panoptic segmentation within the Mask Transformer architecture. Evaluated on Cityscapes, COCO, SegmentMeIfYouCan, and the OoDIS benchmark, our approach achieves state-of-the-art performance. Notably, under zero-shot OOD conditions—i.e., without any OOD training data—it ranks first on OoDIS for anomaly instance segmentation.

Technology Category

Application Category

📝 Abstract
In panoptic segmentation, individual instances must be separated within semantic classes. As state-of-the-art methods rely on a pre-defined set of classes, they struggle with novel categories and out-of-distribution (OOD) data. This is particularly problematic in safety-critical applications, such as autonomous driving, where reliability in unseen scenarios is essential. We address the gap between outstanding benchmark performance and reliability by proposing Prior2Former (P2F), the first approach for segmentation vision transformers rooted in evidential learning. P2F extends the mask vision transformer architecture by incorporating a Beta prior for computing model uncertainty in pixel-wise binary mask assignments. This design enables high-quality uncertainty estimation that effectively detects novel and OOD objects enabling state-of-the-art anomaly instance segmentation and open-world panoptic segmentation. Unlike most segmentation models addressing unknown classes, P2F operates without access to OOD data samples or contrastive training on void (i.e., unlabeled) classes, making it highly applicable in real-world scenarios where such prior information is unavailable. Additionally, P2F can be flexibly applied to anomaly instance and panoptic segmentation. Through comprehensive experiments on the Cityscapes, COCO, SegmentMeIfYouCan, and OoDIS datasets, we demonstrate the state-of-the-art performance of P2F. It achieves the highest ranking in the OoDIS anomaly instance benchmark among methods not using OOD data in any way.
Problem

Research questions and friction points this paper is trying to address.

Addresses novel categories and OOD data in panoptic segmentation
Enables high-quality uncertainty estimation for unseen scenarios
Operates without OOD data or contrastive training on void classes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evidential learning for mask transformers
Beta prior for pixel-wise uncertainty
No OOD data or contrastive training needed
🔎 Similar Papers
No similar papers found.
S
Sebastian Schmidt
Technical University of Munich, BMW Group
J
Julius Korner
Technical University of Munich
D
Dominik Fuchsgruber
Technical University of Munich
Stefano Gasperini
Stefano Gasperini
Postdoc at Technical University of Munich (TUM)
computer visiondeep learningautonomous driving
Federico Tombari
Federico Tombari
Google, TU Munich
Computer VisionMachine Learning3D Perception
S
Stephan Gunnemann
Technical University of Munich