π€ AI Summary
This work addresses the challenge that existing appearance-based eye tracking methods on consumer-grade devices suffer from inaccurate gaze estimation due to interference from diverse screen content. The authors propose a novel approach that, for the first time, leverages the known screen display content as a prior to enable content-aware, robust segmentation of screen reflections in the pupil. By jointly exploiting the position and size of these reflected regions, the method infers the userβs point of regard. Integrating high-resolution corneal reflection capture with an appearance-based baseline model, the proposed technique reduces average tracking error by approximately 8% over the baseline in main experiments. Notably, when the camera is positioned at the bottom of the device, error reduction improves further by 10β20%, substantially surpassing the performance limits of conventional purely visual approaches.
π Abstract
We present a new and accurate approach for gaze estimation on consumer computing devices. We take advantage of continued strides in the quality of user-facing cameras found in e.g., smartphones, laptops, and desktops - 4K or greater in high-end devices - such that it is now possible to capture the 2D reflection of a device's screen in the user's eyes. This alone is insufficient for accurate gaze tracking due to the near-infinite variety of screen content. Crucially, however, the device knows what is being displayed on its own screen - in this work, we show this information allows for robust segmentation of the reflection, the location and size of which encodes the user's screen-relative gaze target. We explore several strategies to leverage this useful signal, quantifying performance in a user study. Our best performing model reduces mean tracking error by ~8% compared to a baseline appearance-based model. A supplemental study reveals an additional 10-20% improvement if the gaze-tracking camera is located at the bottom of the device.