๐ค AI Summary
Visual navigation faces a critical challenge: the scarcity of real-world human navigation data hinders supervised learning of environment representations. To address this, we introduce the first large-scale, open-source dataset enabling supervised, human-centric navigation learningโspanning multi-scale indoor/outdoor real and synthetic scenes. The dataset synchronously captures RGB-D observations, human click trajectories, and expert-annotated explicit landmarks. We propose a novel joint supervision signal combining click trajectories and landmark annotations, supporting end-to-end interpretable waypoint prediction and graph-structured mapping. Through human-in-the-loop annotation, trajectory-level semantic tagging, and cross-scene standardization, our approach significantly improves exploration policy training and localization robustness. The dataset is publicly hosted on Hugging Face (DOI: 10.57967/hf/2386), advancing the paradigm of representation learning for visual navigation.
๐ Abstract
Map representations learned by expert demonstrations have shown promising research value. However, the field of visual navigation still faces challenges due to the lack of real-world human-navigation datasets that can support efficient, supervised, representation learning of environments. We present a Landmark-Aware Visual Navigation (LAVN) dataset to allow for supervised learning of human-centric exploration policies and map building. We collect RGBD observation and human point-click pairs as a human annotator explores virtual and real-world environments with the goal of full coverage exploration of the space. The human annotators also provide distinct landmark examples along each trajectory, which we intuit will simplify the task of map or graph building and localization. These human point-clicks serve as direct supervision for waypoint prediction when learning to explore in environments. Our dataset covers a wide spectrum of scenes, including rooms in indoor environments, as well as walkways outdoors. We release our dataset with detailed documentation at https://huggingface.co/datasets/visnavdataset/lavn (DOI: 10.57967/hf/2386) and a plan for long-term preservation.