๐ค AI Summary
Monocular depth prediction for planar-map-based indoor localization suffers from two key limitations: (i) lack of explicit uncertainty modeling, and (ii) reliance on scene-specific depth networks, hindering generalization and robustness in dynamic environments. Method: This paper proposes a probabilistic sequence-based camera localization framework that requires no scene-specific training. It explicitly models monocular depth prediction as a probability distribution, tightly integrating uncertainty estimation into a planar-map-constrained sequential optimization pipeline. Crucially, it fully leverages off-the-shelf pre-trained depth models without fine-tuning. Contribution/Results: The approach significantly improves generalization and robustness to environmental changes. On the LaMAR HGE dataset, it achieves a 2.7ร improvement in localization recall for long sequences (100 frames) and a 16.7ร improvement for short sequences (15 frames), substantially outperforming state-of-the-art methods.
๐ Abstract
We propose UnLoc, an efficient data-driven solution for sequential camera localization within floorplans. Floorplan data is readily available, long-term persistent, and robust to changes in visual appearance. We address key limitations of recent methods, such as the lack of uncertainty modeling in depth predictions and the necessity for custom depth networks trained for each environment. We introduce a novel probabilistic model that incorporates uncertainty estimation, modeling depth predictions as explicit probability distributions. By leveraging off-the-shelf pre-trained monocular depth models, we eliminate the need to rely on per-environment-trained depth networks, enhancing generalization to unseen spaces. We evaluate UnLoc on large-scale synthetic and real-world datasets, demonstrating significant improvements over existing methods in terms of accuracy and robustness. Notably, we achieve $2.7$ times higher localization recall on long sequences (100 frames) and $16.7$ times higher on short ones (15 frames) than the state of the art on the challenging LaMAR HGE dataset.