🤖 AI Summary
This work addresses the performance degradation of RGB-based object tracking in high-speed motion scenarios caused by motion blur and low illumination. To tackle this challenge, the authors propose SOR-Track, a novel framework that explicitly leverages directional geometric priors from event streams. The method introduces a spatially orthogonal refinement module, which employs local motion directions to guide orthogonal filters for extracting structural responses. Coupled with an asymmetric structure modulation mechanism, SOR-Track achieves physically informed alignment of RGB and event modalities and effectively restores texture details. Experiments on the FE108 benchmark demonstrate that SOR-Track significantly outperforms existing RGB-event fusion trackers, exhibiting particularly robust performance under motion blur and low-light conditions.
📝 Abstract
Robust visual object tracking (VOT) remains challenging in high-speed motion scenarios, where conventional RGB sensors suffer from severe motion blur and performance degradation. Event cameras, with microsecond temporal resolution and high dynamic range, provide complementary structural cues that can potentially compensate for these limitations. However, existing RGB-Event fusion methods typically treat event data as dense intensity representations and adopt black-box fusion strategies, failing to explicitly leverage the directional geometric priors inherently encoded in event streams to rectify degraded RGB features. To address this limitation, we propose SOR-Track, a streamlined framework for robust RGB-Event tracking based on Spatial Orthogonal Refinement (SOR). The core SOR module employs a set of orthogonal directional filters that are dynamically guided by local motion orientations to extract sharp and motion-consistent structural responses from event streams. These responses serve as geometric anchors to modulate and refine aliased RGB textures through an asymmetric structural modulation mechanism, thereby explicitly bridging structural discrepancies between two modalities. Extensive experiments on the large-scale FE108 benchmark demonstrate that SOR-Track consistently outperforms existing fusion-based trackers, particularly under motion blur and low-light conditions. Despite its simplicity, the proposed method offers a principled and physics-grounded approach to multi-modal feature alignment and texture rectification. The source code of this paper will be released on https://github.com/Event-AHU/OpenEvTracking