EventGeM: Global-to-Local Feature Matching for Event-Based Visual Place Recognition

📅 2026-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of visual place recognition using event cameras under complex lighting conditions and real-time constraints. The authors propose an end-to-end framework that integrates both global and local features: global representations are extracted using a pretrained ViT-S/16, while local keypoints are detected via MaxViT. Notably, this is the first approach to incorporate a vision foundation model for depth estimation in this context. The system further refines place matching through structural similarity-based re-ranking and RANSAC-based geometric consistency optimization. Evaluated across multiple benchmark datasets—including scenarios with extreme illumination variations—the method achieves state-of-the-art performance and has been successfully deployed on a robotic platform, enabling real-time online localization.

Technology Category

Application Category

📝 Abstract
Dynamic vision sensors, also known as event cameras, are rapidly rising in popularity for robotic and computer vision tasks due to their sparse activation and high-temporal resolution. Event cameras have been used in robotic navigation and localization tasks where accurate positioning needs to occur on small and frequent time scales, or when energy concerns are paramount. In this work, we present EventGeM, a state-of-the-art global to local feature fusion pipeline for event-based Visual Place Recognition. We use a pre-trained vision transformer (ViT-S/16) backbone to obtain global feature patch for initial match predictions embeddings from event histogram images. Local feature keypoints were then detected using a pre-trained MaxViT backbone for 2D-homography based re-ranking with RANSAC. For additional re-ranking refinement, we subsequently used a pre-trained vision foundation model for depth estimation to compare structural similarity between references and queries. Our work performs state-of-the-art localization when compared to the best currently available event-based place recognition method across several benchmark datasets and lighting conditions all whilst being fully capable of running in real-time when deployed across a variety of compute architectures. We demonstrate the capability of EventGeM in a real-world deployment on a robotic platform for online localization using event streams directly from an event camera. Project page: https://eventgemvpr.github.io/
Problem

Research questions and friction points this paper is trying to address.

event-based visual place recognition
visual localization
event cameras
real-time localization
robust place recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

event-based visual place recognition
global-to-local feature fusion
vision transformer
MaxViT
depth-aware re-ranking
🔎 Similar Papers
No similar papers found.
A
Adam D. Hines
QUT Centre for Robotics, School of Electrical Engineering and Robotics, Queensland University of Technology, Brisbane, QLD 4000, Australia
Gokul B. Nair
Gokul B. Nair
PhD Candidate, QUT Centre for Robotics
Computer VisionRobotics
N
Nicolás Marticorena
QUT Centre for Robotics, School of Electrical Engineering and Robotics, Queensland University of Technology, Brisbane, QLD 4000, Australia
Michael Milford
Michael Milford
QUT Professor | Director, QUT Robotics Centre | ARC Laureate Fellow | Microsoft Fellow
Roboticscomputational neurosciencenavigationSLAMRatSLAM
Tobias Fischer
Tobias Fischer
DECRA Fellow, Senior Lecturer (US: Associate Professor), Queensland University of Technology (QUT)
RoboticsComputer VisionNeuromorphic SystemsRobotic VisionNeurorobotics