OnlinePG: Online Open-Vocabulary Panoptic Mapping with 3D Gaussian Splatting

📅 2026-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing online methods struggle to simultaneously achieve real-time performance and instance-level panoptic mapping under open-vocabulary settings. To address this challenge, this work proposes OnlinePG, the first system that integrates 3D Gaussian splatting with open-vocabulary perception in an online framework. By leveraging a local-to-global sliding window mechanism, OnlinePG constructs a geometry-semantic joint 3D fragment clustering graph and introduces a bidirectional bipartite matching strategy to enable robust global instance fusion. The method combines features from vision-language models with spatial attribute grids, significantly outperforming current online approaches on standard benchmarks. OnlinePG achieves superior open-vocabulary panoptic mapping performance while maintaining real-time efficiency.

Technology Category

Application Category

📝 Abstract
Open-vocabulary scene understanding with online panoptic mapping is essential for embodied applications to perceive and interact with environments. However, existing methods are predominantly offline or lack instance-level understanding, limiting their applicability to real-world robotic tasks. In this paper, we propose OnlinePG, a novel and effective system that integrates geometric reconstruction and open-vocabulary perception using 3D Gaussian Splatting in an online setting. Technically, to achieve online panoptic mapping, we employ an efficient local-to-global paradigm with a sliding window. To build local consistency map, we construct a 3D segment clustering graph that jointly leverages geometric and semantic cues, fusing inconsistent segments within sliding window into complete instances. Subsequently, to update the global map, we construct explicit grids with spatial attributes for the local 3D Gaussian map and fuse them into the global map via robust bidirectional bipartite 3D Gaussian instance matching. Finally, we utilize the fused VLM features inside the 3D spatial attribute grids to achieve open-vocabulary scene understanding. Extensive experiments on widely used datasets demonstrate that our method achieves better performance among online approaches, while maintaining real-time efficiency.
Problem

Research questions and friction points this paper is trying to address.

online panoptic mapping
open-vocabulary scene understanding
instance-level perception
real-time robotic perception
3D scene representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

online panoptic mapping
3D Gaussian Splatting
open-vocabulary perception
instance-level fusion
sliding window optimization
🔎 Similar Papers
No similar papers found.