Efficient Sparse-to-Dense Visual Localization via Compact Gaussian Scene Representation and Accelerated Dense Pose Estimation

๐Ÿ“… 2026-05-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

206K/year
๐Ÿค– AI Summary
This work proposes LiteLoc, an efficient visual localization method based on 3D Gaussian splatting, addressing the high storage redundancy, computational latency, and training complexity of existing sparse-to-dense approaches such as STDLoc. The key innovation lies in decoupling the feature field from the color field to construct a compact, color-free scene representation, reducing storage redundancy by 94%. Additionally, a match-point condensation strategy is introduced, enabling a 19ร— acceleration using only 5% of representative matches. Coupled with an optimized dense PnP solver, LiteLoc achieves superior performance over STDLoc across multiple benchmarks, significantly lowering memory consumption and inference latency while preserving localization accuracyโ€”making it well-suited for low-latency visual localization scenarios.
๐Ÿ“ Abstract
This letter presents LiteLoc, a novel and efficient localizer built on 3D Gaussian Splatting (3DGS). The previous state-of-the-art (SoTA) sparse-to-dense localizer, STDLoc, has shown remarkable localization capability but suffers from severe storage redundancy and computational latency. By revisiting its design decisions, we derive two simple yet highly effective improvements that cumulatively make LiteLoc much more efficient in both memory and computation, while also being easier to train. One key observation is that the color field, inherited directly from Feature 3DGS, is functionally useless for localization. Yet, its reconstruction of high-frequency photometric details necessitates excessive Gaussian primitives, resulting in a tightly coupled color-feature representation with significant memory overhead and sub-optimal feature field optimization. To resolve this, we propose a color-free decoupled feature field that constructs a compact Gaussian scene representation by retaining only task-essential feature attributes, thereby eliminating approximately 94% of redundant storage with no loss of localization-relevant information. We further find that the primary computational bottleneck lies in the dense Perspective-n-Point (PnP) solver, where most matches contribute saturated geometric constraints with diminishing accuracy gains. Accordingly, we propose a condensing strategy that distills dense matches into a subset of 5% representative matches, enabling a nearly 19-fold speedup in robust estimation with negligible performance drop. Extensive experiments show that LiteLoc surpasses STDLoc in multiple scenes with considerable efficiency benefits, opening up exciting prospects for latency-sensitive visual localization.
Problem

Research questions and friction points this paper is trying to address.

visual localization
storage redundancy
computational latency
3D Gaussian Splatting
dense pose estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Splatting
visual localization
compact representation
dense pose estimation
feature decoupling