🤖 AI Summary
This work addresses the challenge of balancing accuracy and efficiency in visual localization on resource-constrained devices by proposing an asymmetric distillation framework. In this framework, a large teacher model processes database images offline, while a compact student model handles query images online. The approach innovatively integrates geometry-driven matching objectives with joint detection-and-description distillation, complemented by a parameter-free, non-learnable nearest-neighbor matching mechanism to enable efficient and accurate feature alignment. Evaluated on standard benchmarks—including HPatches, ScanNet, IMC2022, and Aachen—the method achieves 95% of the teacher model’s localization accuracy while reducing model size by an order of magnitude, significantly outperforming existing lightweight alternatives.
📝 Abstract
Precise and real-time visual localization is critical for applications like AR/VR and robotics, especially on resource-constrained edge devices such as smart glasses, where battery life and heat dissipation can be a primary concerns. While many efficient models exist, further reducing compute without sacrificing accuracy is essential for practical deployment. To address this, we propose asymmetric visual localization: a large Teacher model processes pre-mapped database images offline, while a lightweight Student model processes the query image online. This creates a challenge in matching features from two different models without resorting to heavy, learned matchers. We introduce AsymLoc, a novel distillation framework that aligns a Student to its Teacher through a combination of a geometry-driven matching objective and a joint detector-descriptor distillation objective, enabling fast, parameter-less nearest-neighbor matching. Extensive experiments on HPatches, ScanNet, IMC2022, and Aachen show that AsymLoc achieves up to 95% of the teacher's localization accuracy using an order of magnitude smaller models, significantly outperforming existing baselines and establishing a new state-of-the-art efficiency-accuracy trade-off.