🤖 AI Summary
Traditional video-based tracking methods suffer from target loss under large displacements or nonlinear motion due to inter-frame blind spots and restrictive linear motion assumptions. To address this, we propose the first event-camera framework for arbitrary-point tracking. Our method integrates event-stream processing, explicit motion modeling, and local matching enhancement. Key contributions include: (1) a motion-guided module that explicitly models target dynamics to relax the linear motion assumption; and (2) a variable-motion-aware module enabling temporally consistent responses across diverse velocity scales. Evaluated on synthetic and two real-world event-based datasets, our approach achieves comprehensive improvements over state-of-the-art methods in accuracy and robustness. Moreover, it attains a 150% inference speedup while maintaining a lightweight parameter count, demonstrating strong efficiency–accuracy trade-off.
📝 Abstract
Tracking Any Point (TAP) plays a crucial role in motion analysis. Video-based approaches rely on iterative local matching for tracking, but they assume linear motion during the blind time between frames, which leads to target point loss under large displacements or nonlinear motion. The high temporal resolution and motion blur-free characteristics of event cameras provide continuous, fine-grained motion information, capturing subtle variations with microsecond precision. This paper presents an event-based framework for tracking any point, which tackles the challenges posed by spatial sparsity and motion sensitivity in events through two tailored modules. Specifically, to resolve ambiguities caused by event sparsity, a motion-guidance module incorporates kinematic features into the local matching process. Additionally, a variable motion aware module is integrated to ensure temporally consistent responses that are insensitive to varying velocities, thereby enhancing matching precision. To validate the effectiveness of the approach, an event dataset for tracking any point is constructed by simulation, and is applied in experiments together with two real-world datasets. The experimental results show that the proposed method outperforms existing SOTA methods. Moreover, it achieves 150% faster processing with competitive model parameters.