SplatSearch: Instance Image Goal Navigation for Mobile Robots using 3D Gaussian Splatting and Diffusion Models

📅 2025-11-16

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses instance image-based navigation (IIN) in unknown environments using a single reference image, where the reference viewpoint is arbitrary and the scene only supports sparse-view reconstruction. To tackle this, we propose a navigation framework integrating 3D Gaussian Splatting (3DGS) reconstruction, multi-view diffusion-based completion, and semantic visual context guidance. First, a sparse 3DGS map is constructed and used to render candidate object views from multiple viewpoints; second, diffusion models are employed to inpaint occluded regions and fill missing viewpoints; finally, frontier exploration decisions are made by jointly leveraging rendered visual features and semantic context. To our knowledge, this is the first work to synergistically combine 3DGS and diffusion models for robotic target search, significantly improving matching robustness and semantic consistency. Evaluated in both simulated and real-world home environments, our method achieves state-of-the-art performance in success rate and path length. Ablation studies validate the effectiveness of each component.

Technology Category

Application Category

📝 Abstract

The Instance Image Goal Navigation (IIN) problem requires mobile robots deployed in unknown environments to search for specific objects or people of interest using only a single reference goal image of the target. This problem can be especially challenging when: 1) the reference image is captured from an arbitrary viewpoint, and 2) the robot must operate with sparse-view scene reconstructions. In this paper, we address the IIN problem, by introducing SplatSearch, a novel architecture that leverages sparse-view 3D Gaussian Splatting (3DGS) reconstructions. SplatSearch renders multiple viewpoints around candidate objects using a sparse online 3DGS map, and uses a multi-view diffusion model to complete missing regions of the rendered images, enabling robust feature matching against the goal image. A novel frontier exploration policy is introduced which uses visual context from the synthesized viewpoints with semantic context from the goal image to evaluate frontier locations, allowing the robot to prioritize frontiers that are semantically and visually relevant to the goal image. Extensive experiments in photorealistic home and real-world environments validate the higher performance of SplatSearch against current state-of-the-art methods in terms of Success Rate and Success Path Length. An ablation study confirms the design choices of SplatSearch.

Problem

Research questions and friction points this paper is trying to address.

Navigate robots using single reference images from arbitrary viewpoints

Enable object search with sparse-view 3D scene reconstructions

Improve frontier selection using semantic and visual goal relevance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 3D Gaussian Splatting for sparse-view scene reconstruction

Employs multi-view diffusion model to complete missing image regions

Introduces frontier exploration policy with semantic visual context

🔎 Similar Papers

GaussNav: Gaussian Splatting for Visual Navigation