Distilling LLM Prior to Flow Model for Generalizable Agent's Imagination in Object Goal Navigation

📅 2025-08-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In Object Goal Navigation, conventional semantic mapping methods neglect uncertainty in indoor layouts, resulting in poor cross-environment generalization. To address this, we propose the first generative semantic map completion framework that integrates spatial priors from large language models (LLMs). Our method encodes LLM-derived spatial-semantic priors as 2D Gaussian fields and injects them into a flow-based generative model, enabling diffusion-based imagination and probabilistic semantic distribution modeling of unobserved regions. Our key contribution is the first introduction of LLM-driven generative semantic completion in vision-language navigation—departing from traditional discriminative mapping paradigms. Evaluated on MP3D and Gibson, our approach achieves state-of-the-art performance; notably, it demonstrates significantly improved generalization when transferred to the HM3D dataset, validating its robustness across unseen environments.

Technology Category

Application Category

📝 Abstract
The Object Goal Navigation (ObjectNav) task challenges agents to locate a specified object in an unseen environment by imagining unobserved regions of the scene. Prior approaches rely on deterministic and discriminative models to complete semantic maps, overlooking the inherent uncertainty in indoor layouts and limiting their ability to generalize to unseen environments. In this work, we propose GOAL, a generative flow-based framework that models the semantic distribution of indoor environments by bridging observed regions with LLM-enriched full-scene semantic maps. During training, spatial priors inferred from large language models (LLMs) are encoded as two-dimensional Gaussian fields and injected into target maps, distilling rich contextual knowledge into the flow model and enabling more generalizable completions. Extensive experiments demonstrate that GOAL achieves state-of-the-art performance on MP3D and Gibson, and shows strong generalization in transfer settings to HM3D. Codes and pretrained models are available at https://github.com/Badi-Li/GOAL.
Problem

Research questions and friction points this paper is trying to address.

Modeling semantic distribution for Object Goal Navigation
Overcoming uncertainty in indoor layout generalization
Bridging observed regions with LLM-enriched semantic maps
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative flow-based framework for semantic distribution
LLM-enriched semantic maps with Gaussian fields
Distilling contextual knowledge for generalizable completions
🔎 Similar Papers
No similar papers found.
B
Badi Li
School of Computer Science and Engineering, Sun Yat-sen University
R
Ren-jie Lu
School of Computer Science and Engineering, Sun Yat-sen University
Y
Yu Zhou
School of Computer Science and Engineering, Sun Yat-sen University
Jingke Meng
Jingke Meng
Sun Yat-Sen University
Computer Vision
W
Wei-shi Zheng
School of Computer Science and Engineering, Sun Yat-sen University; Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education