🤖 AI Summary
This paper addresses open-vocabulary semantic 3D mapping, tackling two core challenges: (i) dynamic determination of scene semantic granularity—i.e., “what constitutes an object”—and (ii) robust fusion of multi-view 2D semantic observations into compact, high-fidelity 3D reconstructions. To this end, we propose a task-driven, semantic granularity-adaptive mechanism and introduce the first Bayesian semantic fusion framework grounded in visual-language model (VLM) priors—replacing conventional embedding averaging. Our method integrates semantic Gaussian splatting representation, 3D Gaussian clustering, and object-centric extraction to jointly enable open-vocabulary semantic recognition and dense scene reconstruction. Experiments demonstrate substantial improvements in semantic consistency and geometric fidelity, with a 42% reduction in memory footprint compared to SemGS. The implementation is publicly available.
📝 Abstract
Open-set semantic mapping requires (i) determining the correct granularity to represent the scene (e.g., how should objects be defined), and (ii) fusing semantic knowledge across multiple 2D observations into an overall 3D reconstruction -ideally with a high-fidelity yet low-memory footprint. While most related works bypass the first issue by grouping together primitives with similar semantics (according to some manually tuned threshold), we recognize that the object granularity is task-dependent, and develop a task-driven semantic mapping approach. To address the second issue, current practice is to average visual embedding vectors over multiple views. Instead, we show the benefits of using a probabilistic approach based on the properties of the underlying visual-language foundation model, and leveraging Bayesian updating to aggregate multiple observations of the scene. The result is Bayesian Fields, a task-driven and probabilistic approach for open-set semantic mapping. To enable high-fidelity objects and a dense scene representation, Bayesian Fields uses 3D Gaussians which we cluster into task-relevant objects, allowing for both easy 3D object extraction and reduced memory usage. We release Bayesian Fields open-source at https: //github.com/MIT-SPARK/Bayesian-Fields.