FOM-Nav: Frontier-Object Maps for Object Goal Navigation

📅 2025-11-30

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

To address low search efficiency for target objects in unknown environments, difficulty in maintaining implicit memory for long-horizon planning, and the lack of fine-grained semantic information in explicit maps, this paper proposes a modular navigation framework. Our core method introduces the Frontier-Object Map—a novel online semantic map that jointly encodes spatial frontier structures and object-level semantics—and tightly couples it with a vision-language model (VLM) to enable coordinated high-level goal reasoning and low-level path planning. The framework supports real-time multimodal scene understanding and incremental mapping, trained on a large-scale, automatically generated dataset of realistic scanned navigation scenes. Evaluated on MP3D and HM3D benchmarks, our approach achieves state-of-the-art SPL performance. Further validation on a physical robot platform demonstrates robust deployment capability, significantly improving both long-range navigation robustness and target recognition accuracy.

Technology Category

Application Category

📝 Abstract

This paper addresses the Object Goal Navigation problem, where a robot must efficiently find a target object in an unknown environment. Existing implicit memory-based methods struggle with long-term memory retention and planning, while explicit map-based approaches lack rich semantic information. To address these challenges, we propose FOM-Nav, a modular framework that enhances exploration efficiency through Frontier-Object Maps and vision-language models. Our Frontier-Object Maps are built online and jointly encode spatial frontiers and fine-grained object information. Using this representation, a vision-language model performs multimodal scene understanding and high-level goal prediction, which is executed by a low-level planner for efficient trajectory generation. To train FOM-Nav, we automatically construct large-scale navigation datasets from real-world scanned environments. Extensive experiments validate the effectiveness of our model design and constructed dataset. FOM-Nav achieves state-of-the-art performance on the MP3D and HM3D benchmarks, particularly in navigation efficiency metric SPL, and yields promising results on a real robot.

Problem

Research questions and friction points this paper is trying to address.

Enhances robot navigation efficiency in unknown environments

Integrates spatial frontiers with fine-grained object information

Addresses long-term memory and semantic mapping limitations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Frontier-Object Maps encode spatial frontiers and object details

Vision-language model enables multimodal scene understanding and goal prediction

Automatic dataset construction from real-world scanned environments for training

🔎 Similar Papers

Find Everything: A General Vision Language Model Approach to Multi-Object Search