🤖 AI Summary
This study addresses the challenge of robust long-term visual object tracking under dynamic appearance variations (e.g., illumination changes, pose shifts, and non-rigid deformations). Inspired by phase synchronization mechanisms in neuroscience, we propose a Complex-Valued Recurrent Neural Network (CV-RNN) that, for the first time, models neural phase synchronization as a learnable complex-valued dynamical process. This enables decoupled representation of spatial location and appearance features, along with independent attention modulation. Evaluated on the large-scale, controlled visual tracking benchmark FeatureTracker, our method achieves human-level tracking performance—outperforming all existing deep learning models and exhibiting behavioral patterns highly consistent with human subjects. Key contributions include: (1) a learnable phase synchronization mechanism grounded in complex-valued dynamics; (2) a feature–location decoupling paradigm for attention control; and (3) empirical validation of phase synchronization as a computationally feasible neural substrate for appearance-robust tracking.
📝 Abstract
Objects we encounter often change appearance as we interact with them. Changes in illumination (shadows), object pose, or the movement of non-rigid objects can drastically alter available image features. How do biological visual systems track objects as they change? One plausible mechanism involves attentional mechanisms for reasoning about the locations of objects independently of their appearances -- a capability that prominent neuroscience theories have associated with computing through neural synchrony. Here, we describe a novel deep learning circuit that can learn to precisely control attention to features separately from their location in the world through neural synchrony: the complex-valued recurrent neural network (CV-RNN). Next, we compare object tracking in humans, the CV-RNN, and other deep neural networks (DNNs), using FeatureTracker: a large-scale challenge that asks observers to track objects as their locations and appearances change in precisely controlled ways. While humans effortlessly solved FeatureTracker, state-of-the-art DNNs did not. In contrast, our CV-RNN behaved similarly to humans on the challenge, providing a computational proof-of-concept for the role of phase synchronization as a neural substrate for tracking appearance-morphing objects as they move about.