Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting

📅 2024-10-07
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
In few-view robotic scenarios, 3D Gaussian Splatting (3DGS) suffers from redundancy and occlusion due to random viewpoint selection, hindering real-time performance and reconstruction fidelity. Method: This paper proposes an end-to-end online active perception framework that jointly optimizes visual viewpoint and tactile contact point selection. We innovatively extend the Fisher Information Criterion (FisherRF) to tactile pose selection for the first time. Additionally, we introduce a SAM2-driven semantic depth alignment method integrating Pearson-based depth consistency and normal loss to enhance geometric and textural robustness in low-texture and occluded regions. Results: Evaluated on a real robotic platform, our approach significantly improves reconstruction completeness, geometric accuracy, and rendering quality—especially in challenging low-texture and occluded areas—achieving efficient multimodal (vision + touch), online, few-sample 3D reconstruction.

Technology Category

Application Category

📝 Abstract
We propose a framework for active next best view and touch selection for robotic manipulators using 3D Gaussian Splatting (3DGS). 3DGS is emerging as a useful explicit 3D scene representation for robotics, as it has the ability to represent scenes in a both photorealistic and geometrically accurate manner. However, in real-world, online robotic scenes where the number of views is limited given efficiency requirements, random view selection for 3DGS becomes impractical as views are often overlapping and redundant. We address this issue by proposing an end-to-end online training and active view selection pipeline, which enhances the performance of 3DGS in few-view robotics settings. We first elevate the performance of few-shot 3DGS with a novel semantic depth alignment method using Segment Anything Model 2 (SAM2) that we supplement with Pearson depth and surface normal loss to improve color and depth reconstruction of real-world scenes. We then extend FisherRF, a next-best-view selection method for 3DGS, to select views and touch poses based on depth uncertainty. We perform online view selection on a real robot system during live 3DGS training. We motivate our improvements to few-shot GS scenes, and extend depth-based FisherRF to them, where we demonstrate both qualitative and quantitative improvements on challenging robot scenes. For more information, please see our project page at https://arm.stanford.edu/next-best-sense.
Problem

Research questions and friction points this paper is trying to address.

Optimizes view and touch selection for robotic manipulators using 3D Gaussian Splatting.
Enhances 3DGS performance in few-view settings with semantic depth alignment.
Extends FisherRF for depth uncertainty-based view and touch pose selection.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Active view selection using FisherRF for 3DGS
Semantic depth alignment with SAM2 and Pearson loss
Online training pipeline for few-shot 3DGS scenes
🔎 Similar Papers
No similar papers found.