Real-time Appearance-based Gaze Estimation for Open Domains

📅 2026-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited generalization of existing appearance-based gaze estimation methods in open-domain scenarios—such as those involving eyeglasses or varying illumination—stemming from insufficient training data diversity and inconsistent labels across datasets. To overcome these challenges without requiring additional manual annotations, we propose a lightweight framework that enhances data diversity through synthetic augmentation with glasses, face masks, and complex lighting conditions. The gaze regression task is reformulated as a multi-task learning problem, integrating multi-view supervised contrastive learning, discrete label classification, and eye-region segmentation. Despite having less than 1% of the parameters of the state-of-the-art UniGaze-H model, our approach achieves comparable generalization performance. Furthermore, we introduce the first robustness evaluation benchmark for gaze estimation under challenging real-world conditions, enabling high-accuracy, real-time tracking on mobile devices.
📝 Abstract
Appearance-based gaze estimation (AGE) has achieved remarkable performance in constrained settings, yet we reveal a significant generalization gap where existing AGE models often fail in practical, unconstrained scenarios, particularly those involving facial wearables and poor lighting conditions. We attribute this failure to two core factors: limited image diversity and inconsistent label fidelity across different datasets, especially along the pitch axis. To address these, we propose a robust AGE framework that enhances generalization without requiring additional human-annotated data. First, we expand the image manifold via an ensemble of augmentation techniques, including synthesis of eyeglasses, masks, and varied lighting. Second, to mitigate the impact of anisotropic inter-dataset label deviation, we reformulate gaze regression as a multi-task learning problem, incorporating multi-view supervised contrastive (SupCon) learning, discretized label classification, and eye-region segmentation as auxiliary objectives. To rigorously validate our approach, we curate new benchmark datasets designed to evaluate gaze robustness under challenging conditions, a dimension largely overlooked by existing evaluation protocols. Our MobileNet-based lightweight model achieves generalization performance competitive with the state-of-the-art (SOTA) UniGaze-H, while utilizing less than 1\% of its parameters, enabling high-fidelity, real-time gaze tracking on mobile devices.
Problem

Research questions and friction points this paper is trying to address.

gaze estimation
generalization gap
unconstrained scenarios
label fidelity
real-time
Innovation

Methods, ideas, or system contributions that make the work stand out.

appearance-based gaze estimation
multi-task learning
supervised contrastive learning
domain generalization
real-time mobile gaze tracking
🔎 Similar Papers
No similar papers found.
Zhenhao Li
Zhenhao Li
York University
Software EngineeringAIOpsMining Software Repositories
Z
Zheng Liu
Huawei Technologies Canada
S
Seunghyun Lee
University of Toronto
A
Amin Fadaeinejad
Huawei Technologies Canada
Y
Yuanhao Yu
Huawei Technologies Canada