🤖 AI Summary
This work addresses the challenge of visual place recognition using event cameras under complex lighting conditions and real-time constraints. The authors propose an end-to-end framework that integrates both global and local features: global representations are extracted using a pretrained ViT-S/16, while local keypoints are detected via MaxViT. Notably, this is the first approach to incorporate a vision foundation model for depth estimation in this context. The system further refines place matching through structural similarity-based re-ranking and RANSAC-based geometric consistency optimization. Evaluated across multiple benchmark datasets—including scenarios with extreme illumination variations—the method achieves state-of-the-art performance and has been successfully deployed on a robotic platform, enabling real-time online localization.
📝 Abstract
Dynamic vision sensors, also known as event cameras, are rapidly rising in popularity for robotic and computer vision tasks due to their sparse activation and high-temporal resolution. Event cameras have been used in robotic navigation and localization tasks where accurate positioning needs to occur on small and frequent time scales, or when energy concerns are paramount. In this work, we present EventGeM, a state-of-the-art global to local feature fusion pipeline for event-based Visual Place Recognition. We use a pre-trained vision transformer (ViT-S/16) backbone to obtain global feature patch for initial match predictions embeddings from event histogram images. Local feature keypoints were then detected using a pre-trained MaxViT backbone for 2D-homography based re-ranking with RANSAC. For additional re-ranking refinement, we subsequently used a pre-trained vision foundation model for depth estimation to compare structural similarity between references and queries. Our work performs state-of-the-art localization when compared to the best currently available event-based place recognition method across several benchmark datasets and lighting conditions all whilst being fully capable of running in real-time when deployed across a variety of compute architectures. We demonstrate the capability of EventGeM in a real-world deployment on a robotic platform for online localization using event streams directly from an event camera. Project page: https://eventgemvpr.github.io/