🤖 AI Summary
This work addresses the challenge that existing AR/VR eye-tracking systems rely on costly real-world data collection, while synthetic data often lacks photorealism. The authors propose GazePrior, a data-driven 3D eye prior model that captures the statistical distribution of human eye appearance across identities, gaze directions, and lighting conditions. By leveraging sparse annotations from existing devices, GazePrior reconstructs high-fidelity 3D eye geometry and renders realistic synthetic data tailored to any target device. This approach enables, for the first time, the generation of cross-device training data that is simultaneously realistic, diverse, and precisely annotated—without requiring new data acquisition—and supports zero-shot transfer. Experiments demonstrate that eye-tracking models trained on this synthetic data significantly outperform state-of-the-art methods in zero-shot settings, achieving markedly improved accuracy and robustness.
📝 Abstract
Eye tracking (ET) is a foundational technology for advanced AR/VR applications. However, training ET models for every new ET device is challenging: real data collection is costly and time-consuming, while existing synthetic data generation methods lack realism. To remove the need for additional data collection while maintaining data quality, we introduce a data-driven 3D prior that models the distribution of human eyes across diverse identities, gaze directions, and light settings. This model, which we coin GazePrior, then enables sparse-input 3D reconstruction of annotated data collected with previous ET devices, which can in turn be rendered from the cameras of any target ET device. Our approach synthesizes data with the realism, diversity and ground-truth accuracy of real data collection without its prohibitive costs. Our experiments demonstrate that ET models trained with our synthesized data outperform previous zero-shot methods, achieving higher accuracy and robustness.