Neuromorphic Eye Tracking for Low-Latency Pupil Detection

๐Ÿ“… 2025-12-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Eye-tracking for AR/VR wearables demands ultra-low latency (<10 ms) and milliwatt-level power consumption, yet conventional frame-based approaches suffer from motion blur, high computational overhead, and insufficient temporal resolution. This paper introduces the first lightweight event-driven spiking neural network (SNN) for eye-tracking: it replaces RNNs and attention modules with leaky integrate-and-fire (LIF) neurons, and integrates neuromorphic event cameras with depthwise separable convolutions to achieve millisecond-scale pupil localization. The model achieves high accuracyโ€”mean squared error of 3.7โ€“4.1 pixels, approaching that of the specialized Retina system (3.24 pixels)โ€”while reducing model size by 20ร—, theoretical computation by 850ร—, and power consumption to only 3.9โ€“4.9 mW. End-to-end latency is as low as 3 ms at 1 kHz. To our knowledge, this is the first work to jointly optimize high performance, ultra-low power, and high temporal fidelity in neuromorphic eye-tracking.

Technology Category

Application Category

๐Ÿ“ Abstract
Eye tracking for wearable systems demands low latency and milliwatt-level power, but conventional frame-based pipelines struggle with motion blur, high compute cost, and limited temporal resolution. Such capabilities are vital for enabling seamless and responsive interaction in emerging technologies like augmented reality (AR) and virtual reality (VR), where understanding user gaze is key to immersion and interface design. Neuromorphic sensors and spiking neural networks (SNNs) offer a promising alternative, yet existing SNN approaches are either too specialized or fall short of the performance of modern ANN architectures. This paper presents a neuromorphic version of top-performing event-based eye-tracking models, replacing their recurrent and attention modules with lightweight LIF layers and exploiting depth-wise separable convolutions to reduce model complexity. Our models obtain 3.7-4.1px mean error, approaching the accuracy of the application-specific neuromorphic system, Retina (3.24px), while reducing model size by 20x and theoretical compute by 850x, compared to the closest ANN variant of the proposed model. These efficient variants are projected to operate at an estimated 3.9-4.9 mW with 3 ms latency at 1 kHz. The present results indicate that high-performing event-based eye-tracking architectures can be redesigned as SNNs with substantial efficiency gains, while retaining accuracy suitable for real-time wearable deployment.
Problem

Research questions and friction points this paper is trying to address.

Develop low-latency, low-power eye tracking for AR/VR wearables
Overcome motion blur and high compute in conventional eye tracking
Redesign event-based models as efficient spiking neural networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Replacing recurrent and attention modules with lightweight LIF layers
Using depth-wise separable convolutions to reduce model complexity
Achieving low latency and milliwatt-level power with SNN redesign
๐Ÿ”Ž Similar Papers
2024-09-27Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous TechnologiesCitations: 3