A Landmark-Aware Visual Navigation Dataset

📅 2024-02-22

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Visual navigation faces a critical challenge: the scarcity of real-world human navigation data hinders supervised learning of environment representations. To address this, we introduce the first large-scale, open-source dataset enabling supervised, human-centric navigation learning—spanning multi-scale indoor/outdoor real and synthetic scenes. The dataset synchronously captures RGB-D observations, human click trajectories, and expert-annotated explicit landmarks. We propose a novel joint supervision signal combining click trajectories and landmark annotations, supporting end-to-end interpretable waypoint prediction and graph-structured mapping. Through human-in-the-loop annotation, trajectory-level semantic tagging, and cross-scene standardization, our approach significantly improves exploration policy training and localization robustness. The dataset is publicly hosted on Hugging Face (DOI: 10.57967/hf/2386), advancing the paradigm of representation learning for visual navigation.

Technology Category

Application Category

📝 Abstract

Map representations learned by expert demonstrations have shown promising research value. However, the field of visual navigation still faces challenges due to the lack of real-world human-navigation datasets that can support efficient, supervised, representation learning of environments. We present a Landmark-Aware Visual Navigation (LAVN) dataset to allow for supervised learning of human-centric exploration policies and map building. We collect RGBD observation and human point-click pairs as a human annotator explores virtual and real-world environments with the goal of full coverage exploration of the space. The human annotators also provide distinct landmark examples along each trajectory, which we intuit will simplify the task of map or graph building and localization. These human point-clicks serve as direct supervision for waypoint prediction when learning to explore in environments. Our dataset covers a wide spectrum of scenes, including rooms in indoor environments, as well as walkways outdoors. We release our dataset with detailed documentation at https://huggingface.co/datasets/visnavdataset/lavn (DOI: 10.57967/hf/2386) and a plan for long-term preservation.

Problem

Research questions and friction points this paper is trying to address.

Lack of real-world human-navigation datasets

Supervised learning of human-centric exploration policies

Map building and localization simplification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised learning of exploration policies

RGBD observation with human point-clicks

Landmark examples for map building

🔎 Similar Papers

No similar papers found.