π€ AI Summary
This work proposes a lightweight, monocular vision-based approach for topological mapping and navigation that operates without requiring metrically accurate maps or pretrained models. Leveraging AnyLoc, the method converts keyframes into global descriptors to construct a topological graph where nodes represent environmental locations. Navigation decisions are made by matching segmented images to target nodes in the graph. To the best of our knowledge, this is the first application of AnyLoc to monocular topological navigation, significantly enhancing the systemβs generalization capability and computational efficiency. Experimental results demonstrate effective loop closure and navigation performance in both real-world and simulated environments, achieving a 60.2% higher average success rate compared to a ResNet-based baseline while substantially reducing both time and memory overhead.
π Abstract
This paper proposes a method for topological mapping and navigation using a monocular camera. Based on AnyLoc, keyframes are converted into descriptors to construct topological relationships, enabling loop detection and map building. Unlike metric maps, topological maps simplify path planning and navigation by representing environments with key nodes instead of precise coordinates. Actions for visual navigation are determined by comparing segmented images with the image associated with target nodes. The system relies solely on a monocular camera, ensuring fast map building and navigation using key nodes. Experiments show effective loop detection and navigation in real and simulation environments without pre-training. Compared to a ResNet-based method, this approach improves success rates by 63.8% on average while reducing time and space costs, offering a lightweight solution for robot and human navigation in various scenarios.