SplatSearch: Instance Image Goal Navigation for Mobile Robots using 3D Gaussian Splatting and Diffusion Models

📅 2025-11-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses instance image-based navigation (IIN) in unknown environments using a single reference image, where the reference viewpoint is arbitrary and the scene only supports sparse-view reconstruction. To tackle this, we propose a navigation framework integrating 3D Gaussian Splatting (3DGS) reconstruction, multi-view diffusion-based completion, and semantic visual context guidance. First, a sparse 3DGS map is constructed and used to render candidate object views from multiple viewpoints; second, diffusion models are employed to inpaint occluded regions and fill missing viewpoints; finally, frontier exploration decisions are made by jointly leveraging rendered visual features and semantic context. To our knowledge, this is the first work to synergistically combine 3DGS and diffusion models for robotic target search, significantly improving matching robustness and semantic consistency. Evaluated in both simulated and real-world home environments, our method achieves state-of-the-art performance in success rate and path length. Ablation studies validate the effectiveness of each component.

Technology Category

Application Category

📝 Abstract
The Instance Image Goal Navigation (IIN) problem requires mobile robots deployed in unknown environments to search for specific objects or people of interest using only a single reference goal image of the target. This problem can be especially challenging when: 1) the reference image is captured from an arbitrary viewpoint, and 2) the robot must operate with sparse-view scene reconstructions. In this paper, we address the IIN problem, by introducing SplatSearch, a novel architecture that leverages sparse-view 3D Gaussian Splatting (3DGS) reconstructions. SplatSearch renders multiple viewpoints around candidate objects using a sparse online 3DGS map, and uses a multi-view diffusion model to complete missing regions of the rendered images, enabling robust feature matching against the goal image. A novel frontier exploration policy is introduced which uses visual context from the synthesized viewpoints with semantic context from the goal image to evaluate frontier locations, allowing the robot to prioritize frontiers that are semantically and visually relevant to the goal image. Extensive experiments in photorealistic home and real-world environments validate the higher performance of SplatSearch against current state-of-the-art methods in terms of Success Rate and Success Path Length. An ablation study confirms the design choices of SplatSearch.
Problem

Research questions and friction points this paper is trying to address.

Navigate robots using single reference images from arbitrary viewpoints
Enable object search with sparse-view 3D scene reconstructions
Improve frontier selection using semantic and visual goal relevance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 3D Gaussian Splatting for sparse-view scene reconstruction
Employs multi-view diffusion model to complete missing image regions
Introduces frontier exploration policy with semantic visual context
🔎 Similar Papers
No similar papers found.