π€ AI Summary
Existing gaze redirection methods struggle to simultaneously achieve 3D consistency, high generation quality, and efficient inference. This paper proposes the first real-time, 3D-aware gaze redirection framework that, given a single facial image and a target gaze prompt, learns a disentangled and controllable facial representation. We introduce a neural rendering decoder and incorporate geometric priors from a pre-trained 3D portrait generator via knowledge distillation, significantly improving 3D consistency and fine-detail fidelity. Our method achieves state-of-the-art performance across multiple benchmarks: gaze redirection error is reduced by 21%, FID improves by 18%, and inference speed reaches 0.06 seconds per frameβ800Γ faster than prior best approaches. To our knowledge, this is the first method enabling high-fidelity, geometrically consistent gaze editing at real-time speeds.
π Abstract
Gaze redirection methods aim to generate realistic human face images with controllable eye movement. However, recent methods often struggle with 3D consistency, efficiency, or quality, limiting their practical applications. In this work, we propose RTGaze, a real-time and high-quality gaze redirection method. Our approach learns a gaze-controllable facial representation from face images and gaze prompts, then decodes this representation via neural rendering for gaze redirection. Additionally, we distill face geometric priors from a pretrained 3D portrait generator to enhance generation quality. We evaluate RTGaze both qualitatively and quantitatively, demonstrating state-of-the-art performance in efficiency, redirection accuracy, and image quality across multiple datasets. Our system achieves real-time, 3D-aware gaze redirection with a feedforward network (~0.06 sec/image), making it 800x faster than the previous state-of-the-art 3D-aware methods.