PanSt3R: Multi-view Consistent Panoptic Segmentation

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Existing methods for 3D scene panoptic segmentation from pose-agnostic 2D images suffer from insufficient cross-view spatial relationship modeling and reliance on time-consuming test-time optimization. Method: This paper introduces the first end-to-end joint framework that unifies geometric reconstruction and multi-view-consistent panoptic segmentation, eliminating test-time optimization entirely. Contribution/Results: Key innovations include: (i) implicit geometry modeling extended from MUSt3R; (ii) semantic-aware cross-view feature alignment; (iii) semantics-guided attention; and (iv) a lightweight 3D mask aggregation strategy. The framework supports novel-view panoptic generation driven by 3D Gaussian Splatting (3DGS). It achieves state-of-the-art performance across multiple benchmarks, with inference speed accelerated by several orders of magnitude—delivering high accuracy, real-time efficiency, and scalability.

Technology Category

Application Category

📝 Abstract

Panoptic segmentation of 3D scenes, involving the segmentation and classification of object instances in a dense 3D reconstruction of a scene, is a challenging problem, especially when relying solely on unposed 2D images. Existing approaches typically leverage off-the-shelf models to extract per-frame 2D panoptic segmentations, before optimizing an implicit geometric representation (often based on NeRF) to integrate and fuse the 2D predictions. We argue that relying on 2D panoptic segmentation for a problem inherently 3D and multi-view is likely suboptimal as it fails to leverage the full potential of spatial relationships across views. In addition to requiring camera parameters, these approaches also necessitate computationally expensive test-time optimization for each scene. Instead, in this work, we propose a unified and integrated approach PanSt3R, which eliminates the need for test-time optimization by jointly predicting 3D geometry and multi-view panoptic segmentation in a single forward pass. Our approach builds upon recent advances in 3D reconstruction, specifically upon MUSt3R, a scalable multi-view version of DUSt3R, and enhances it with semantic awareness and multi-view panoptic segmentation capabilities. We additionally revisit the standard post-processing mask merging procedure and introduce a more principled approach for multi-view segmentation. We also introduce a simple method for generating novel-view predictions based on the predictions of PanSt3R and vanilla 3DGS. Overall, the proposed PanSt3R is conceptually simple, yet fast and scalable, and achieves state-of-the-art performance on several benchmarks, while being orders of magnitude faster than existing methods.

Problem

Research questions and friction points this paper is trying to address.

3D panoptic segmentation from unposed 2D images

Eliminating test-time optimization for scene processing

Improving multi-view consistency in segmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint 3D geometry and panoptic segmentation prediction

Eliminates test-time optimization with single forward pass

Enhances MUSt3R with semantic and panoptic capabilities

🔎 Similar Papers

A Simple and Generalist Approach for Panoptic Segmentation