🤖 AI Summary
Eye-tracking (ET) hardware prototyping for AR/VR traditionally relies on costly real-world data collection, impeding rapid iteration. Method: This paper introduces the first end-to-end ET hardware performance simulation framework. It leverages NeRF-reconstructed high-fidelity 3D eye models to synthesize multi-view, multi-parameter camera images, jointly modeling optical blur, illumination variations, and sensor noise—enabling purely synthetic, ML-based ET accuracy evaluation. Contribution/Results: The framework supports physics-free performance analysis under arbitrary camera poses—including extreme edge views—without physical fabrication. It establishes, for the first time, a strong correlation (r = 0.92) between simulated and measured ET performance on real hardware (Project Aria). Compared to conventional hardware iteration cycles spanning months, our approach reduces evaluation time to days, substantially accelerating ET hardware design and optimization.
📝 Abstract
Eye tracking (ET) is a key enabler for Augmented and Virtual Reality (AR/VR). Prototyping new ET hardware requires assessing the impact of hardware choices on eye tracking performance. This task is compounded by the high cost of obtaining data from sufficiently many variations of real hardware, especially for machine learning, which requires large training datasets. We propose a method for end-to-end evaluation of how hardware changes impact machine learning-based ET performance using only synthetic data. We utilize a dataset of real 3D eyes, reconstructed from light dome data using neural radiance fields (NeRF), to synthesize captured eyes from novel viewpoints and camera parameters. Using this framework, we demonstrate that we can predict the relative performance across various hardware configurations, accounting for variations in sensor noise, illumination brightness, and optical blur. We also compare our simulator with the publicly available eye tracking dataset from the Project Aria glasses, demonstrating a strong correlation with real-world performance. Finally, we present a first-of-its-kind analysis in which we vary ET camera positions, evaluating ET performance ranging from on-axis direct views of the eye to peripheral views on the frame. Such an analysis would have previously required manufacturing physical devices to capture evaluation data. In short, our method enables faster prototyping of ET hardware.