Distilling LLM Prior to Flow Model for Generalizable Agent's Imagination in Object Goal Navigation

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

In Object Goal Navigation, conventional semantic mapping methods neglect uncertainty in indoor layouts, resulting in poor cross-environment generalization. To address this, we propose the first generative semantic map completion framework that integrates spatial priors from large language models (LLMs). Our method encodes LLM-derived spatial-semantic priors as 2D Gaussian fields and injects them into a flow-based generative model, enabling diffusion-based imagination and probabilistic semantic distribution modeling of unobserved regions. Our key contribution is the first introduction of LLM-driven generative semantic completion in vision-language navigation—departing from traditional discriminative mapping paradigms. Evaluated on MP3D and Gibson, our approach achieves state-of-the-art performance; notably, it demonstrates significantly improved generalization when transferred to the HM3D dataset, validating its robustness across unseen environments.

Technology Category

Application Category

📝 Abstract

The Object Goal Navigation (ObjectNav) task challenges agents to locate a specified object in an unseen environment by imagining unobserved regions of the scene. Prior approaches rely on deterministic and discriminative models to complete semantic maps, overlooking the inherent uncertainty in indoor layouts and limiting their ability to generalize to unseen environments. In this work, we propose GOAL, a generative flow-based framework that models the semantic distribution of indoor environments by bridging observed regions with LLM-enriched full-scene semantic maps. During training, spatial priors inferred from large language models (LLMs) are encoded as two-dimensional Gaussian fields and injected into target maps, distilling rich contextual knowledge into the flow model and enabling more generalizable completions. Extensive experiments demonstrate that GOAL achieves state-of-the-art performance on MP3D and Gibson, and shows strong generalization in transfer settings to HM3D. Codes and pretrained models are available at https://github.com/Badi-Li/GOAL.

Problem

Research questions and friction points this paper is trying to address.

Modeling semantic distribution for Object Goal Navigation

Overcoming uncertainty in indoor layout generalization

Bridging observed regions with LLM-enriched semantic maps

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative flow-based framework for semantic distribution

LLM-enriched semantic maps with Gaussian fields

Distilling contextual knowledge for generalizable completions

🔎 Similar Papers

Advancing Object Goal Navigation Through LLM-enhanced Object Affinities Transfer