🤖 AI Summary
In Object Goal Navigation, conventional semantic mapping methods neglect uncertainty in indoor layouts, resulting in poor cross-environment generalization. To address this, we propose the first generative semantic map completion framework that integrates spatial priors from large language models (LLMs). Our method encodes LLM-derived spatial-semantic priors as 2D Gaussian fields and injects them into a flow-based generative model, enabling diffusion-based imagination and probabilistic semantic distribution modeling of unobserved regions. Our key contribution is the first introduction of LLM-driven generative semantic completion in vision-language navigation—departing from traditional discriminative mapping paradigms. Evaluated on MP3D and Gibson, our approach achieves state-of-the-art performance; notably, it demonstrates significantly improved generalization when transferred to the HM3D dataset, validating its robustness across unseen environments.
📝 Abstract
The Object Goal Navigation (ObjectNav) task challenges agents to locate a specified object in an unseen environment by imagining unobserved regions of the scene. Prior approaches rely on deterministic and discriminative models to complete semantic maps, overlooking the inherent uncertainty in indoor layouts and limiting their ability to generalize to unseen environments. In this work, we propose GOAL, a generative flow-based framework that models the semantic distribution of indoor environments by bridging observed regions with LLM-enriched full-scene semantic maps. During training, spatial priors inferred from large language models (LLMs) are encoded as two-dimensional Gaussian fields and injected into target maps, distilling rich contextual knowledge into the flow model and enabling more generalizable completions. Extensive experiments demonstrate that GOAL achieves state-of-the-art performance on MP3D and Gibson, and shows strong generalization in transfer settings to HM3D. Codes and pretrained models are available at https://github.com/Badi-Li/GOAL.