🤖 AI Summary
Current gaze estimation methods lag behind commercial systems in model lightweighting, real-time performance, privacy preservation, and robustness under real-world conditions—particularly suffering from accuracy degradation due to head motion. This paper introduces the first browser-native lightweight eye-tracking framework, integrating on-device few-shot personalization calibration (≤9 calibration points), meta-learning-driven model adaptation, monocular image-based head pose modeling, and WebGL-accelerated inference. The framework operates entirely client-side—eliminating cloud dependency—to ensure privacy, ultra-low latency, and high accuracy. Evaluated on the GazeCapture dataset, it achieves a state-of-the-art error of 2.32 cm. On an iPhone 14, it attains real-time inference at 2.4 ms per frame. Its plug-and-play cross-user deployment capability bridges the gap between academic models and industrial-grade solutions in both performance and practicality.
📝 Abstract
With advancements in AI, new gaze estimation methods are exceeding state-of-the-art (SOTA) benchmarks, but their real-world application reveals a gap with commercial eye-tracking solutions. Factors like model size, inference time, and privacy often go unaddressed. Meanwhile, webcam-based eye-tracking methods lack sufficient accuracy, in particular due to head movement. To tackle these issues, we introduce We bEyeTrack, a framework that integrates lightweight SOTA gaze estimation models directly in the browser. It incorporates model-based head pose estimation and on-device few-shot learning with as few as nine calibration samples (k < 9). WebEyeTrack adapts to new users, achieving SOTA performance with an error margin of 2.32 cm on GazeCapture and real-time inference speeds of 2.4 milliseconds on an iPhone 14. Our open-source code is available at https://github.com/RedForestAi/WebEyeTrack.