🤖 AI Summary
This work addresses the challenging problem of reconstructing complete, metrically accurate bird’s-eye-view (BEV) indoor floorplans from a single egocentric RGB image, where severe information loss hinders reliable inference. To this end, we establish the first unified benchmark for single-view ground-level BEV completion and propose an end-to-end generative framework that integrates training-free methods, deterministic models, ensemble learning, and stochastic generative components to effectively model uncertainty and enable plausible completion. We further introduce FlatLands, a large-scale, high-quality dataset comprising over 270,000 aligned samples, designed to support both in-distribution and out-of-distribution evaluation. This benchmark provides a rigorous platform for advancing uncertainty-aware mapping in embodied navigation tasks.
📝 Abstract
A single egocentric image typically captures only a small portion of the floor, yet a complete metric traversability map of the surroundings would better serve applications such as indoor navigation. We introduce FlatLands, a dataset and benchmark for single-view bird's-eye view (BEV) floor completion. The dataset contains 270,575 observations from 17,656 real metric indoor scenes drawn from six existing datasets, with aligned observation, visibility, validity, and ground-truth BEV maps, and the benchmark includes both in- and out-of-distribution evaluation protocols. We compare training-free approaches, deterministic models, ensembles, and stochastic generative models. Finally, we instantiate the task as an end-to-end monocular RGB-to-floormaps pipeline. FlatLands provides a rigorous testbed for uncertainty-aware indoor mapping and generative completion for embodied navigation.