Latent BKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty

📅 2024-10-15

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing robotic semantic recognition methods are constrained by closed-category vocabularies, lack support for open-vocabulary queries, and fail to quantify prediction uncertainty. To address these limitations, we propose a visual–language latent-space continuous semantic mapping framework operating under an open-vocabulary setting. Our approach is the first to integrate Bayesian kernel inference (BKI) with multimodal neural embeddings—specifically leveraging CLIP-based vision–language representations—to enable probabilistic, spatially continuous voxel-level semantic modeling. It jointly incorporates BKI-driven uncertainty propagation, voxelized uncertainty representation, and recursive multi-observation fusion. Evaluated on Matterport3D and Semantic KITTI, our method significantly outperforms both explicit semantic mapping and state-of-the-art vision–language mapping approaches. Furthermore, real-world experiments in complex indoor environments demonstrate its robustness and strong generalization capability to unseen categories and scenes.

Technology Category

Application Category

📝 Abstract

This paper introduces a novel probabilistic mapping algorithm, LatentBKI, which enables open-vocabulary mapping with quantifiable uncertainty. Traditionally, semantic mapping algorithms focus on a fixed set of semantic categories which limits their applicability for complex robotic tasks. Vision-Language (VL) models have recently emerged as a technique to jointly model language and visual features in a latent space, enabling semantic recognition beyond a predefined, fixed set of semantic classes. LatentBKI recurrently incorporates neural embeddings from VL models into a voxel map with quantifiable uncertainty, leveraging the spatial correlations of nearby observations through Bayesian Kernel Inference (BKI). LatentBKI is evaluated against similar explicit semantic mapping and VL mapping frameworks on the popular Matterport3D and Semantic KITTI datasets, demonstrating that LatentBKI maintains the probabilistic benefits of continuous mapping with the additional benefit of open-dictionary queries. Real-world experiments demonstrate applicability to challenging indoor environments.

Problem

Research questions and friction points this paper is trying to address.

Robotics

Recognition Algorithms

Uncertainty Estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

LatentBKI

visual-linguistic models

open-vocabulary continuous dictionary

🔎 Similar Papers

Pre-trained Vision-Language Models Learn Discoverable Visual Concepts