🤖 AI Summary
Traditional visual localization relies on fixed 3D scene models, hindering adaptability to dynamic environments and costly scene updates.
Method: This paper proposes and systematically investigates an “unstructured” visual localization paradigm that replaces explicit 3D reconstruction with a dynamically modifiable image database, integrating multi-view geometry, image retrieval, 2D–2D feature matching, PnP, and nonlinear optimization—contrasting geometric reasoning against end-to-end learning approaches.
Contribution/Results: We present the first comprehensive survey and benchmark evaluation of unstructured methods, revealing that classical geometric solvers (e.g., absolute and semi-generalized pose estimation) significantly outperform state-of-the-art deep regression models, establishing practical performance upper bounds and design principles. Experiments show geometric methods achieve substantially higher pose accuracy than SOTA learning-based approaches; while marginally less accurate than structured SOTA methods, they offer dramatically improved flexibility—explicitly characterizing the accuracy–flexibility trade-off frontier.
📝 Abstract
Visual localization algorithms, i.e., methods that estimate the camera pose of a query image in a known scene, are core components of many applications, including self-driving cars and augmented / mixed reality systems. State-of-the-art visual localization algorithms are structure-based, i.e., they store a 3D model of the scene and use 2D-3D correspondences between the query image and 3D points in the model for camera pose estimation. While such approaches are highly accurate, they are also rather inflexible when it comes to adjusting the underlying 3D model after changes in the scene. Structureless localization approaches represent the scene as a database of images with known poses and thus offer a much more flexible representation that can be easily updated by adding or removing images. Although there is a large amount of literature on structure-based approaches, there is significantly less work on structureless methods. Hence, this paper is dedicated to providing the, to the best of our knowledge, first comprehensive discussion and comparison of structureless methods. Extensive experiments show that approaches that use a higher degree of classical geometric reasoning generally achieve higher pose accuracy. In particular, approaches based on classical absolute or semi-generalized relative pose estimation outperform very recent methods based on pose regression by a wide margin. Compared with state-of-the-art structure-based approaches, the flexibility of structureless methods comes at the cost of (slightly) lower pose accuracy, indicating an interesting direction for future work.