🤖 AI Summary
This work addresses the challenge of feature matching with event cameras under large baselines, arbitrary viewpoints, and extreme motion—particularly the lack of effective correspondences across datasets. We propose the first event-based feature matching model capable of zero-shot cross-dataset deployment. Our approach introduces a motion-robust and computationally efficient attention-based backbone that integrates multi-temporal-scale features, coupled with a sparsity-aware event token selection strategy to enhance both training and inference efficiency. Leveraging a large-scale synthetic event-motion framework and wide-baseline supervision, the model achieves strong generalization without target-domain fine-tuning, outperforming state-of-the-art methods by 37.7% across multiple benchmarks and establishing the first zero-shot solution for large-baseline cross-dataset event matching.
📝 Abstract
Event cameras have recently shown promising capabilities in instantaneous motion estimation due to their robustness to low light and fast motions. However, computing wide-baseline correspondence between two arbitrary views remains a significant challenge, since event appearance changes substantially with motion, and learning-based approaches are constrained by both scalability and limited wide-baseline supervision. We therefore introduce the first event matching model that achieves cross-dataset wide-baseline correspondence in a zero-shot manner: a single model trained once is deployed on unseen datasets without any target-domain fine-tuning or adaptation. To enable this capability, we introduce a motion-robust and computationally efficient attention backbone that learns multi-timescale features from event streams, augmented with sparsity-aware event token selection, making large-scale training on diverse wide-baseline supervision computationally feasible. To provide the supervision needed for wide-baseline generalization, we develop a robust event motion synthesis framework to generate large-scale event-matching datasets with augmented viewpoints, modalities, and motions. Extensive experiments across multiple benchmarks show that our framework achieves a 37.7% improvement over the previous best event feature matching methods. Code and data are available at: https://github.com/spikelab-jhu/Match-Any-Events.