๐ค AI Summary
This work addresses the challenges of inaccurate cognitive load estimation in real-time eye tracking, primarily caused by frequent missing dataโsuch as blinks and tracking failuresโand inefficient modeling of long-range temporal dependencies. To overcome these limitations, we propose MambaGaze, a novel framework that explicitly models the uncertainty inherent in missing observations and irregular temporal intervals through XMD encoding, while leveraging a linear-complexity bidirectional Mamba-2 architecture to efficiently capture long-range dependencies. Evaluated on the CLARE and CL-Drive datasets, MambaGaze achieves accuracies of 76.8% and 73.1%, respectively, outperforming CNN- and Transformer-based baselines by 4โ12 percentage points. Furthermore, the model supports edge deployment, delivering real-time inference at 43โ68 FPS on Jetson platforms with power consumption below 7.5 W.
๐ Abstract
Real-time cognitive load assessment from eye-tracking signals could potentially enable adaptive human-centered-AI such as safety-critical applications such as driver vigilance monitoring or automated flight deck assistance, yet two challenges persist: handling frequent data missingness from blinks and tracking failures, and efficiently modeling long-range temporal dependencies. We propose MambaGaze, a framework that addresses these challenges through 1) XMD encoding, which augments raw features with observation masks and time-deltas to explicitly model data uncertainty, and 2) bidirectional Mamba-2, which captures temporal dependencies with linear computational complexity. Experiments on CLARE and CL-Drive datasets under leave-one-subject-out evaluation show that MambaGaze achieves 76.8% and 73.1% accuracy, respectively, outperforming CNN, Transformer, ResNet, and VGG baselines by 4-12 percentage points. Edge deployment benchmarks on NVIDIA Jetson platforms demonstrate real-time inference at 43-68 FPS with power consumption below 7.5W, confirming feasibility for wearable cognitive load monitoring.