🤖 AI Summary
This work proposes a lightweight, texture- and pose-prior-free 6-DoF camera relocalization method tailored for structured indoor environments. It introduces 3D planar primitives as region-level structural semantic representations and establishes cross-modal correspondences between query images and a sparse plane-based map through a deep matcher operating in a unified embedding space. Camera poses are then recovered via robust optimization, eliminating the need for photorealistic textures, initial pose estimates, or scene-specific training. Evaluated on multiple benchmarks including ScanNet and 12Scenes, the framework achieves high accuracy and efficiency, demonstrating the effectiveness and generalizability of a structure-primitive-based relocalization paradigm.
📝 Abstract
While structure-based relocalizers have long strived for point correspondences when establishing or regressing query-map associations, in this paper, we pioneer the use of planar primitives and 3D planar maps for lightweight 6-DoF camera relocalization in structured environments. Planar primitives, beyond being fundamental entities in projective geometry, also serve as region-based representations that encapsulate both structural and semantic richness. This motivates us to introduce PlanaReLoc, a streamlined plane-centric paradigm where a deep matcher associates planar primitives across the query image and the map within a learned unified embedding space, after which the 6-DoF pose is solved and refined under a robust framework. Through comprehensive experiments on the ScanNet and 12Scenes datasets across hundreds of scenes, our method demonstrates the superiority of planar primitives in facilitating reliable cross-modal structural correspondences and achieving effective camera relocalization without requiring realistically textured/colored maps, pose priors, or per-scene training. The code and data are available at https://github.com/3dv-casia/PlanaReLoc .