A Landmark-Aware Visual Navigation Dataset

๐Ÿ“… 2024-02-22
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Visual navigation faces a critical challenge: the scarcity of real-world human navigation data hinders supervised learning of environment representations. To address this, we introduce the first large-scale, open-source dataset enabling supervised, human-centric navigation learningโ€”spanning multi-scale indoor/outdoor real and synthetic scenes. The dataset synchronously captures RGB-D observations, human click trajectories, and expert-annotated explicit landmarks. We propose a novel joint supervision signal combining click trajectories and landmark annotations, supporting end-to-end interpretable waypoint prediction and graph-structured mapping. Through human-in-the-loop annotation, trajectory-level semantic tagging, and cross-scene standardization, our approach significantly improves exploration policy training and localization robustness. The dataset is publicly hosted on Hugging Face (DOI: 10.57967/hf/2386), advancing the paradigm of representation learning for visual navigation.

Technology Category

Application Category

๐Ÿ“ Abstract
Map representations learned by expert demonstrations have shown promising research value. However, the field of visual navigation still faces challenges due to the lack of real-world human-navigation datasets that can support efficient, supervised, representation learning of environments. We present a Landmark-Aware Visual Navigation (LAVN) dataset to allow for supervised learning of human-centric exploration policies and map building. We collect RGBD observation and human point-click pairs as a human annotator explores virtual and real-world environments with the goal of full coverage exploration of the space. The human annotators also provide distinct landmark examples along each trajectory, which we intuit will simplify the task of map or graph building and localization. These human point-clicks serve as direct supervision for waypoint prediction when learning to explore in environments. Our dataset covers a wide spectrum of scenes, including rooms in indoor environments, as well as walkways outdoors. We release our dataset with detailed documentation at https://huggingface.co/datasets/visnavdataset/lavn (DOI: 10.57967/hf/2386) and a plan for long-term preservation.
Problem

Research questions and friction points this paper is trying to address.

Lack of real-world human-navigation datasets
Supervised learning of human-centric exploration policies
Map building and localization simplification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised learning of exploration policies
RGBD observation with human point-clicks
Landmark examples for map building
๐Ÿ”Ž Similar Papers
No similar papers found.
F
Faith Johnson
Rutgers University, New Brunswick, NJ, USA
B
Bryan Bo Cao
Stony Brook University, Stony Brook, NY, USA
K
Kristin J. Dana
Rutgers University, New Brunswick, NJ, USA
S
Shubham Jain
Stony Brook University, Stony Brook, NY, USA
Ashwin Ashok
Ashwin Ashok
Associate Professor, Georgia State University
Visible Light CommunicationMobile and Vehicular SystemsIoTRobotics.