Spatial Orthogonal Refinement for Robust RGB-Event Visual Object Tracking

📅 2026-03-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation of RGB-based object tracking in high-speed motion scenarios caused by motion blur and low illumination. To tackle this challenge, the authors propose SOR-Track, a novel framework that explicitly leverages directional geometric priors from event streams. The method introduces a spatially orthogonal refinement module, which employs local motion directions to guide orthogonal filters for extracting structural responses. Coupled with an asymmetric structure modulation mechanism, SOR-Track achieves physically informed alignment of RGB and event modalities and effectively restores texture details. Experiments on the FE108 benchmark demonstrate that SOR-Track significantly outperforms existing RGB-event fusion trackers, exhibiting particularly robust performance under motion blur and low-light conditions.
📝 Abstract
Robust visual object tracking (VOT) remains challenging in high-speed motion scenarios, where conventional RGB sensors suffer from severe motion blur and performance degradation. Event cameras, with microsecond temporal resolution and high dynamic range, provide complementary structural cues that can potentially compensate for these limitations. However, existing RGB-Event fusion methods typically treat event data as dense intensity representations and adopt black-box fusion strategies, failing to explicitly leverage the directional geometric priors inherently encoded in event streams to rectify degraded RGB features. To address this limitation, we propose SOR-Track, a streamlined framework for robust RGB-Event tracking based on Spatial Orthogonal Refinement (SOR). The core SOR module employs a set of orthogonal directional filters that are dynamically guided by local motion orientations to extract sharp and motion-consistent structural responses from event streams. These responses serve as geometric anchors to modulate and refine aliased RGB textures through an asymmetric structural modulation mechanism, thereby explicitly bridging structural discrepancies between two modalities. Extensive experiments on the large-scale FE108 benchmark demonstrate that SOR-Track consistently outperforms existing fusion-based trackers, particularly under motion blur and low-light conditions. Despite its simplicity, the proposed method offers a principled and physics-grounded approach to multi-modal feature alignment and texture rectification. The source code of this paper will be released on https://github.com/Event-AHU/OpenEvTracking
Problem

Research questions and friction points this paper is trying to address.

visual object tracking
motion blur
event cameras
RGB-Event fusion
geometric priors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatial Orthogonal Refinement
RGB-Event Fusion
Event Camera
Visual Object Tracking
Structural Modulation
🔎 Similar Papers
No similar papers found.
Dexing Huang
Dexing Huang
Institute of Automation, Chinese Academy of Sciences
Medical Image ProcessingComputer VisionAIGCVLMs
Shiao Wang
Shiao Wang
安徽大学
Deep Learning
F
Fan Zhang
School of Computer Science and Technology, Anhui University, Hefei 230601, China
X
Xiao Wang
School of Computer Science and Technology, Anhui University, Hefei 230601, China