🤖 AI Summary
Existing event-camera-based eye-tracking methods rely on GPU acceleration, making them unsuitable for resource-constrained embedded devices such as smart glasses.
Method: We propose a lightweight, fully event-driven convolutional neural network architecture optimized for microcontrollers. It comprises two low-complexity models—grid-based classifier and pixel-level regressor—and supports end-to-end training, evaluation, and INT8 quantization. The system processes raw event streams directly, eliminating image reconstruction and achieving microsecond-level inference latency with significantly reduced power consumption.
Contribution/Results: Evaluated on public event-based datasets, our approach matches the accuracy of state-of-the-art GPU-dependent methods. It is the first fully event-driven eye-tracking system capable of real-time operation on microcontroller units (MCUs), successfully deployed on STM32H7-series platforms. This work establishes a viable pathway for edge intelligence in wearable eye-tracking applications.
📝 Abstract
Event-based cameras are becoming a popular solution for efficient, low-power eye tracking. Due to the sparse and asynchronous nature of event data, they require less processing power and offer latencies in the microsecond range. However, many existing solutions are limited to validation on powerful GPUs, with no deployment on real embedded devices. In this paper, we present EETnet, a convolutional neural network designed for eye tracking using purely event-based data, capable of running on microcontrollers with limited resources. Additionally, we outline a methodology to train, evaluate, and quantize the network using a public dataset. Finally, we propose two versions of the architecture: a classification model that detects the pupil on a grid superimposed on the original image, and a regression model that operates at the pixel level.