GazeShift: Unsupervised Gaze Estimation and Dataset for VR

📅 2026-03-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of current VR eye tracking, which suffers from the absence of large-scale, precisely annotated off-axis near-eye image datasets and the inability of conventional methods to ensure accurate gaze targets. To overcome these challenges, we introduce VRGaze—the first large-scale off-axis eye-tracking dataset in VR—and propose GazeShift, an attention-guided unsupervised representation learning framework that disentangles eye appearance from gaze direction without requiring ground-truth labels or relying on multi-view inputs or 3D geometry. GazeShift enables lightweight few-shot calibration for individual users and achieves efficient inference on VR headset GPUs with only 5 ms latency. Experiments demonstrate that our method attains a mean angular error of 1.84° on VRGaze and a cross-person error of 7.15° on MPIIGaze, while reducing model parameters by 10× and computational cost by 35×.

Technology Category

Application Category

📝 Abstract
Gaze estimation is instrumental in modern virtual reality (VR) systems. Despite significant progress in remote-camera gaze estimation, VR gaze research remains constrained by data scarcity - particularly the lack of large-scale, accurately labeled datasets captured with the off-axis camera configurations typical of modern headsets. Gaze annotation is difficult since fixation on intended targets cannot be guaranteed. To address these challenges, we introduce VRGaze - the first large-scale off-axis gaze estimation dataset for VR - comprising 2.1 million near-eye infrared images collected from 68 participants. We further propose GazeShift, an attention-guided unsupervised framework for learning gaze representations without labeled data. Unlike prior redirection-based methods that rely on multi-view or 3D geometry, GazeShift is tailored to near-eye infrared imagery, achieving effective gaze-appearance disentanglement in a compact, real-time model. GazeShift embeddings can be optionally adapted to individual users via lightweight few-shot calibration, achieving a 1.84-degree mean error on VRGaze. On the remote-camera MPIIGaze dataset, the model achieves a 7.15-degree person-agnostic error, doing so with 10x fewer parameters and 35x fewer FLOPs than baseline methods. Deployed natively on a VR headset GPU, inference takes only 5 ms. Combined with demonstrated robustness to illumination changes, these results highlight GazeShift as a label-efficient, real-time solution for VR gaze tracking. Project code and the VRGaze dataset are released at https://github.com/gazeshift3/gazeshift.
Problem

Research questions and friction points this paper is trying to address.

gaze estimation
virtual reality
data scarcity
off-axis camera
gaze annotation
Innovation

Methods, ideas, or system contributions that make the work stand out.

unsupervised gaze estimation
off-axis near-eye imaging
gaze-appearance disentanglement
few-shot calibration
real-time VR tracking
🔎 Similar Papers
No similar papers found.
Gil Shapira
Gil Shapira
The World Bank
I
Ishay Goldin
Samsung Semiconductor Israel R&D Center (SIRC)
E
Evgeny Artyomov
Samsung Semiconductor Israel R&D Center (SIRC)
D
Donghoon Kim
Samsung Electronics
Yosi Keller
Yosi Keller
Faculty of Engineering, Bar Ilan University
image processingsignal processingmachine learningdimensionality reduction
N
Niv Zehngut
Samsung Semiconductor Israel R&D Center (SIRC)