GazeD: Context-Aware Diffusion for Accurate 3D Gaze Estimation

📅 2026-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a conditional diffusion-based approach for jointly estimating 3D gaze direction and human body pose from a single RGB image. By modeling 3D gaze as a virtual joint located at a fixed distance from the eyes and denoising it alongside body pose, the method effectively leverages 2D pose cues, scene context, and environmental information to generate diverse yet plausible multi-hypothesis predictions. Evaluated on three benchmark datasets, the proposed method achieves state-of-the-art performance in 3D gaze estimation, even surpassing existing approaches that rely on temporal information. This highlights its superior capability in handling uncertainty and modeling contextual dependencies from a single-frame input.

Technology Category

Application Category

📝 Abstract
We introduce GazeD, a new 3D gaze estimation method that jointly provides 3D gaze and human pose from a single RGB image. Leveraging the ability of diffusion models to deal with uncertainty, it generates multiple plausible 3D gaze and pose hypotheses based on the 2D context information extracted from the input image. Specifically, we condition the denoising process on the 2D pose, the surroundings of the subject, and the context of the scene. With GazeD we also introduce a novel way of representing the 3D gaze by positioning it as an additional body joint at a fixed distance from the eyes. The rationale is that the gaze is usually closely related to the pose, and thus it can benefit from being jointly denoised during the diffusion process. Evaluations across three benchmark datasets demonstrate that GazeD achieves state-of-the-art performance in 3D gaze estimation, even surpassing methods that rely on temporal information. Project details will be available at https://aimagelab.ing.unimore.it/go/gazed.
Problem

Research questions and friction points this paper is trying to address.

3D gaze estimation
gaze prediction
RGB image
human pose
computer vision
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion model
3D gaze estimation
context-aware conditioning
joint pose and gaze representation
single-image inference
🔎 Similar Papers
No similar papers found.
R
Riccardo Catalini
University of Modena and Reggio Emilia
Davide Di Nucci
Davide Di Nucci
University of Modena and Reggio Emilia
Computer Vision
G
G. Borghi
University of Modena and Reggio Emilia
D
Davide Davoli
Toyota Motor Europe
Lorenzo Garattoni
Lorenzo Garattoni
Toyota Motor Europe
RoboticsArtificial IntelligenceComputer Vision
G
Giampiero Francesca
Toyota Motor Europe
Yuki Kawana
Yuki Kawana
Machine learning engineer, Woven by Toyota
machine learning
R
R. Vezzani
University of Modena and Reggio Emilia