🤖 AI Summary
Addressing the challenge of achieving real-time performance, low latency, and ultra-low power consumption for eye tracking on resource-constrained microcontrollers, this work proposes a hardware–algorithm co-optimized, event-driven eye-tracking system. The system integrates a DVXplorer Micro dynamic vision sensor with an STM32N6 microcontroller, employs a lightweight end-to-end CNN model, and leverages an on-chip neural network accelerator for efficient edge inference. It achieves an inference latency of only 385 μs—sub-millisecond responsiveness—while attaining a mean localization error of 5.99 pixels on the Ini-30 dataset, an energy cost of 155 μJ per inference, and a computational throughput of 52 MAC/cycle. To the best of our knowledge, this is the first real-time, event-camera-based eye-tracking system deployed on an ultra-low-power MCU platform. The work establishes a practical, high-throughput, low-overhead solution for embedded human–computer interaction.
📝 Abstract
This paper presents a novel event-based eye-tracking system deployed on a resource-constrained microcontroller, addressing the challenges of real-time, low-latency, and low-power performance in embedded systems. The system leverages a Dynamic Vision Sensor (DVS), specifically the DVXplorer Micro, with an average temporal resolution of 200 μs, to capture rapid eye movements with extremely low latency. The system is implemented on a novel low-power and high-performance microcontroller from STMicroelectronics, the STM32N6. The microcontroller features an 800 MHz Arm Cortex-M55 core and AI hardware accelerator, the Neural-ART Accelerator, enabling real-time inference with milliwatt power consumption. The paper propose a hardware-aware and sensor-aware compact Convolutional Neuron Network (CNN) optimized for event-based data, deployed at the edge, achieving a mean pupil prediction error of 5.99 pixels and a median error of 5.73 pixels on the Ini-30 dataset. The system achieves an end-to-end inference latency of just 385 μs and a neural network throughput of 52 Multiply and Accumulate (MAC) operations per cycle while consuming just 155 μJ of energy. This approach allows for the development of a fully embedded, energy-efficient eye-tracking solution suitable for applications such as smart glasses and wearable devices.