Real-time Appearance-based Gaze Estimation for Open Domains

📅 2026-03-27

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses the limited generalization of existing appearance-based gaze estimation methods in open-domain scenarios—such as those involving eyeglasses or varying illumination—stemming from insufficient training data diversity and inconsistent labels across datasets. To overcome these challenges without requiring additional manual annotations, we propose a lightweight framework that enhances data diversity through synthetic augmentation with glasses, face masks, and complex lighting conditions. The gaze regression task is reformulated as a multi-task learning problem, integrating multi-view supervised contrastive learning, discrete label classification, and eye-region segmentation. Despite having less than 1% of the parameters of the state-of-the-art UniGaze-H model, our approach achieves comparable generalization performance. Furthermore, we introduce the first robustness evaluation benchmark for gaze estimation under challenging real-world conditions, enabling high-accuracy, real-time tracking on mobile devices.

Technology Category

Application Category

📝 Abstract

Appearance-based gaze estimation (AGE) has achieved remarkable performance in constrained settings, yet we reveal a significant generalization gap where existing AGE models often fail in practical, unconstrained scenarios, particularly those involving facial wearables and poor lighting conditions. We attribute this failure to two core factors: limited image diversity and inconsistent label fidelity across different datasets, especially along the pitch axis. To address these, we propose a robust AGE framework that enhances generalization without requiring additional human-annotated data. First, we expand the image manifold via an ensemble of augmentation techniques, including synthesis of eyeglasses, masks, and varied lighting. Second, to mitigate the impact of anisotropic inter-dataset label deviation, we reformulate gaze regression as a multi-task learning problem, incorporating multi-view supervised contrastive (SupCon) learning, discretized label classification, and eye-region segmentation as auxiliary objectives. To rigorously validate our approach, we curate new benchmark datasets designed to evaluate gaze robustness under challenging conditions, a dimension largely overlooked by existing evaluation protocols. Our MobileNet-based lightweight model achieves generalization performance competitive with the state-of-the-art (SOTA) UniGaze-H, while utilizing less than 1\% of its parameters, enabling high-fidelity, real-time gaze tracking on mobile devices.

Problem

Research questions and friction points this paper is trying to address.

gaze estimation

generalization gap

unconstrained scenarios

label fidelity

real-time

Innovation

Methods, ideas, or system contributions that make the work stand out.

appearance-based gaze estimation

multi-task learning

supervised contrastive learning