Scene-Centric Unsupervised Panoptic Segmentation

📅 2025-04-02

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This paper addresses unsupervised panoptic segmentation in complex urban scenes, proposing the first scene-centric approach that eliminates reliance on object-center priors or manual annotations. Methodologically, it fuses multi-modal cues—RGB appearance, estimated depth, and optical flow—to generate high-resolution panoptic pseudo-labels and introduces a two-stage panoptic self-training framework. Key technical contributions include: (1) cross-modal pseudo-label generation guided jointly by depth estimation and motion cues; and (2) a panoptic self-training strategy enforcing consistency across both semantic and instance segmentation tasks. Evaluated on Cityscapes, the method achieves an unsupervised Panoptic Quality (PQ) of 32.1%, surpassing the prior state-of-the-art by 9.4 PQ points—a significant advancement for unsupervised panoptic segmentation.

Technology Category

Application Category

📝 Abstract

Unsupervised panoptic segmentation aims to partition an image into semantically meaningful regions and distinct object instances without training on manually annotated data. In contrast to prior work on unsupervised panoptic scene understanding, we eliminate the need for object-centric training data, enabling the unsupervised understanding of complex scenes. To that end, we present the first unsupervised panoptic method that directly trains on scene-centric imagery. In particular, we propose an approach to obtain high-resolution panoptic pseudo labels on complex scene-centric data, combining visual representations, depth, and motion cues. Utilizing both pseudo-label training and a panoptic self-training strategy yields a novel approach that accurately predicts panoptic segmentation of complex scenes without requiring any human annotations. Our approach significantly improves panoptic quality, e.g., surpassing the recent state of the art in unsupervised panoptic segmentation on Cityscapes by 9.4% points in PQ.

Problem

Research questions and friction points this paper is trying to address.

Unsupervised panoptic segmentation without annotated data

Training on scene-centric imagery directly

Combining visual, depth, and motion cues

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised panoptic segmentation without object-centric data

Combines visual, depth, and motion cues for pseudo labels

Panoptic self-training strategy improves segmentation accuracy

🔎 Similar Papers

No similar papers found.

Bosch Group

Attraktive Vergütung

Horb am Neckar, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)